# load libraries used in this notebook
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# import data

url="https://raw.githubusercontent.com/amoreira2/Lectures/main/assets/data/GlobalFinMonthly.csv"
Data = pd.read_csv(url,na_values=-99)
Data['Date']=pd.to_datetime(Data['Date'])
Data=Data.set_index(['Date'])

6.3. The choice of frequency and Annualization of returns#

Data is always structured at a particular frequency, daily, monthly…
For example, the data set “Data” that we have been working with is at the “monthly” frequency
So the returns there tell us what you would have earned if you bought a particular asset at the closing of the last trading day of the month (say 31 of January) and sold at the last trading day of the next month (28 of Ferbruary).
But this Frequency choice is entirely arbitrary since there are transactions happening at each milisecond
In this course we will work at monthly or daily frequency since that is what most practioners work with (exception of course for High Frequency trading funds)
It also keeps it managable–as you will quickly see that the data set can get very large once you go to higher frequencies
One could argue that monthly is too short. Most people have one year or even multiple year investment plans so maybe it makes sense to look at the data at lower frequecies.
There is merit to this view, but you end up with much less data, so harder to make conclusive statements
What we end up doing in the academic world and in the industry is to make out analaysis at monthly frquency, and we then extrapolate the results to yearly and so on.
Now we will discuss how to do that.
It is much easier to keep the units at the yearly frequency in your head. So we will be almost always annualize our results just to get intuition about what they mean

Standard annualization (the quick and dirty way)

$\hat{\mu}_A=12\times\hat{\mu}_M$
$\hat{\sigma}^2_A=12\times\hat{\sigma}^2_M$
$\hat{\sigma}_A=\sqrt{12}\times\hat{\sigma}_M$
Formulas make sense if monthly returns are i.i.d. and a sum of monthly returns (e.g. log returns)
However, annual return are given by

\[𝑅_𝐴=(1+𝑅_1 )(1+𝑅_2 )…(1+𝑅_{12})⁡−1\]

if returns were i.i.d, averages are

\[\mu_A=(1+\mu_M)^{12}-1\]

and variances are uglier still,

\[\sigma_A^2=[\sigma^2_M+(1+\mu_M)^2]^{12}-(1+\mu_M)^{24}\]

and this still ignores time-variation in volatility, auto-correlations..

However, we will always use the standard annualization

$\hat{\mu}_A=12\times\hat{\mu}_M$
$\hat{\sigma}^2_A=12\times\hat{\sigma}^2_M$
$\hat{\sigma}_A=\sqrt{12}\times\hat{\sigma}_M$

# simply multiply by number of months in a year
Data.MKT.mean()*12

0.10862132921174653

# for standard deviation you multiply by the square root of number of periods since the varaince grows with T

Data.MKT.var()*12

0.023235325364506825

Data.MKT.std()*12**0.5

0.15243137919899177

If it is wrong, why we will use it?

Because it is the standard
good idea about annual magnitudes
allows you to compare across assets pretty well
easy to get t-stats from monthly data
ok if you don’t compare returns across frequencies, i.e., use annual data for real estate, and monthly for stocks

6.3.1. Changing the dataset frequency#

Suppose that you don’t want to do this approximation. What do you do?
You have to aggregate the monthly data set to the frequence of choice (say yearly)
This makes no assumptions and it is always correct
For that we will have to learn the `groupby’ method
Conceptually, what do we need to do?

For every year we want to compute the cumulative returns

\[r_{year}=\prod_{t \in year} (1+r_t)-1\]

In the end we want a table that looks like

year	MKT…
1997	$r_{1997}$
1998	$r_{1998}$
1999	$r_{1999}$

We add 1 to the net return to transform it into a gross returns allowing us to compound it
- If you invest 1 in start of month 1 and the returns are 0.2 in month 1 and 0.1 in month 2, how many dollars do you have in the end of month 2?
\[ \$(1+r_1)(1+r_2)=1*(1+0.2)(1+0.1) \]
- What was you net cumulative return between start of month 1 and end of month 2?
  
  \[ (1+r_1)(1+r_2)-1=1*(1+0.2)(1+0.1)-1 \]

We then just need to compount all the mothly returns of a given year to obtain the yearly returns
Let’s do it step by step:
- step 1: $(1+r_t)$
```
(Data+1)
```
- step 2: group by year
```
(Data+1).groupby(Data.index.year)
```
- step 3: gross return within each year using function prod()
```
(Data+1).groupby(Data.index.year).prod()
```
- step 4: subtract 1 to get the net return
```
(Data+1).groupby(Data.index.year).prod()-1
```

The key here is the groupby method. This is a super useful method that we will use more later on

For more on it, please look at https://amoreira2.github.io/quantitativeinvesting/chapters/pandas/groupby.html

Datayear=(Data+1).groupby(Data.index.year).prod()-1

Datayear.head()

	RF	MKT	USA30yearGovBond	EmergingMarkets	WorldxUSA	WorldxUSAGovBond
Date
1963	0.028564	0.150037	-0.000649	-0.030022	0.012480	0.029684
1964	0.035257	0.160619	0.044469	-0.062168	-0.034879	0.022714
1965	0.039186	0.144355	0.011539	0.100006	0.029518	0.030502
1966	0.047503	-0.087400	-0.020147	-0.033125	-0.109079	0.003024
1967	0.041986	0.286824	-0.070192	0.106930	0.134399	0.015742

# we could want to calculate the standard devition, the mean, the median, the min, the max, or even to apply some customized function
# in our case all we need is the product

[Datayear.MKT.mean(),Data.MKT.mean()*12,Datayear.MKT.std(),Data.MKT.std()*12**0.5]

[0.1155945787163523,
10862132921174653,
17219331914550604,
15243137919899177]

In problem set I will always ask you about annual numbers
You should always go for the quick and dirty annualization unless told otherwise

Quantitative Investing

The choice of frequency and Annualization of returns

Contents

6.3. The choice of frequency and Annualization of returns#

6.3.1. Changing the dataset frequency#