17.28. Assignment 6#

Instructions: This problem set should be done in a group.

Your group was assigned at orientation and you can find it in blackboard as well.

You will use this same group to do your final project.

Answer each question in the designated space below.

After you are done. save and upload in blackboard.

Please check that you are submitting the correct file. One way to avoid mistakes is to save it with a different name.

17.29. Names of your group members#

Please write names below

  • [Name]:

  • [Name]:

  • [Name]:

  • [Name]:

  • [Name]:

17.30. Discussion Forum Assignment#

Do the Value investing assignment in the discussion forum

https://edstem.org/us/courses/30665/discussion/1987699

17.31. Exercises#

1. Data Cleaning

You will work with the same dataset as in Assigment 5. The dataset has address

url='https://github.com/amoreira2/Lectures/blob/main/assets/data/Assignment5.xlsx?raw=true'

Do the followings:

  • Import pandas, numpy, matplotlib, and load the data set.

  • Import the datasets of industry returns and risk free rate.

  • Parse the date.

  • Set the index.

  • Drop missing observations.

  • Construct a dataframe with only excess returns.

  • Call this dataframe with the 49 excess returns time series df.

  • Call df.head() to check if everything works

Hint: You did it in assignment 5, simply copy and paste your code.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
from pandas.tseries.offsets import MonthEnd

# your code below

df.head()

2. Expected excess return estimation

Compute the sample mean as the estimators for the expected excess returns of the 49 assets.

Call this ERe.

# your code below

ERe.head()

3. Expected excess return uncertainty

We will now construct an estimator for the amount of uncertainty in our sample mean estimator. If we assume that each individual asset is uncorrelated over time (not terrible assumption), then the variance of the mean is

\[var(\bar{r_i}) = var(\frac{\sum_t^T r_{i,t}}{T})=\frac{\sum_t^T var( r_{i,t})}{T^2} = \frac{var(r_{i,t})}{T}\]

So all you need is the sample size (T) and the variance of each asset to obtain the varaince of our estimator.

Please use this formula to compute the STANDARD DEVIATION of sample average estimator of each 49 asset. Call this ERe_se.

# your code below

ERe_se.head()

4. Constructing a confidence interval for the expected excess return, part 1

We will now want to construct the 95% confidence interval for our estimator. The interval is such that it contains the true mean 95% of the time.

The way to do this is to use the normal distribution CDF to figure out the threshold that leaves only 2.5% probability at the each side of the tails.

Why 2.5% and not 5%? Because it is symmetric, so there is 2.5% probability in the left tail and 2.5 % in the right tail so overall there is only 5% probability that the expected return is out of the interval. Thus there is 95% probability that it is in the interval.

In this exercise, you will find the threshold by doing the followings:

  1. import the stats library from the scipy package with from scipy import stats

  2. get the standard normal distribution with sn=stats.norm(0,1), where 0 is the mean and 1 is the standard deviation

  3. get the threshold by using inverse cumulative distribution function for the appropriate prob_value to create a 95% CI (see discussion above). threshold=sn.isf(prob_value)

  4. make sure that this threshold is positive (if you got from the left tail, you will have to take the absolute value or just get from the right tail).

  5. make sure you did things correctly by calling print(threshold).

Hint:

You can always check in the normal table that you did things correctly https://en.wikipedia.org/wiki/Normal_distribution.

# your code below

print(threshold)

5. Constructing a confidence interval for the expected excess return, part 2

Armed with these threshold you can construct the interval as follows

\[[\bar{r}-threshold\times\sigma(\bar{r}),\bar{r}+threshold\times\sigma(\bar{r})]\]

Do the followings:

  1. create an empty dataframe which has the names of industries as index and ‘lower’ and ‘upper’ as column names. Name it ERe_ci.

  2. construct the lower bound of the interval, \(\bar{r}-threshold\times\sigma(\bar{r})\), and store it in the column of ‘lower’

  3. compute the upper bound symmetrically, and store it in the column of ‘upper’

  4. call ERe_ci.head()

# your code below

ERe_ci.head()

6. Compute the tangency portfolio weights for a portfolio with annualized volatility of 10%

Store these in a dataframe whose rows are the names of the assets and the first column has the label ‘mve_data’.

Name this data frame Weights.

print(Weights)

TIP: You did this in Assigment 5.

# your code below

print(Weights.head())

7. Sensitivity to uncertainty of the Tangent portfolio calculation

Now we will compute the tangency portfolio but using a slightly different estimate for the mean.

Do the followings.

  1. instead of using the sample mean for each asset, first we will pick one asset, Hlth.

  2. change its mean to be its lower bound of CI in Exercise 5 and then recalculate the tangency portfolio weights.

  3. store this in dataframe Weights with the column name mve_Hlth-1.95.

  4. create another column with weights computed from the perturbation in which its mean is changed to be the upper bound of CI, label this column mve_Hlth+1.95 .

  5. do a bar plot of these three sets of weights using Weights.plot.bar().

Discuss what you notice in the bar plot:

  1. How much do the weights change?

  2. Which assets are impacted? Why?

Hint:

you might want to create a copy of your ERe estimator before you do the perturbation.

# your code below
# your discussion below

# 1. How much do the weights change?

# 2. Which assets are impacted? Why?

8. Performance impact of estimation uncertainty

Your Weight dataframe has 3 different weight schemes.

Do the followings:

  1. compute the in-sample Sharpe Ratio for these 3 different weight schemes.

  2. discuss the results that you obtained

Hint: You should use the real data (e.g. df,ERe,CovRe ) to compute the Sharpe ratio.

# your code below
# your discussion below

9. Reporduce analysis of Exercises 7-8 for all assets

Do the followings:

  1. use a for loop to loop through the 49 portfolios and create the “perturbed” weights and the Sharpe Ratio of the perturbed weights.

  2. record for each asset the average drop in the Sharpe Ratio associated with the perturbation in the tangency portfolio weights.

  3. store the results in a dataframe named dSR (difference in SR): $\(dSR[asset]=\frac{1}{2}\frac{SR(asset+1.95)+SR(asset-1.95)}{SR(data)}\)$ where SR(asset+1.95) and SR(asset-1.95) were the Sharpe ratios obtained when you perturb the expected excess return of that asset to the upper and lower bound of the CI. dSR should be a dataframe with industry names as index and a column, called SR_change, containing the results of the calculation from the expression above.

  4. do a bar plot of this Sharpe ratio change.

Discuss the bar plot:

  1. What do you think is the key takeaway from the analysis above

Hint: Note that all you need to do here is to get the code you developed above and adapt it to work with a for loop.

# your code below
    
# your discussion below

10. Monte Carlo, part 1

So far our focus is on the estimation uncertainty of risk premiums. Covariance matrices also need to be estimated.

You will now implement a Monte-Carlo method to evaluate the overall uncertainty in the construction of the tangency portfolio.

You already have the sample estimates from the vector of expected excess returns ERe and the variance-covariance matrix CovRe (you used those in Exercise 6).

Now you use the function np.random.multivariate_normal to simulate draws from a multivariate normal distribution with vector of mean equal to ERe and the covariance matrix equal to CovRe.

Do the following:

  1. write the code that draws ONE realization of returns for this set of 49 assets.

Hint:

you should get a vector 49 by 1 that changes every time you run the cell.

type np.random.multivariate_normal? to see how this function works

# your code below

11. Monte Carlo, part 2

Do the followings:

  1. now set the parameter size in the multivaraite_normal function to draw T realizations of the 49 assets, where you set T to the number of months you have in the data set.

  2. print the shape of your draw. This should return you a T by N matrix of returns, something with exactly the same shape as our data set.

Hint: every time you run the cell again you get a different realization.

# your code below

12. Monte Carlo, part 3

Do the followings:

  1. copy the code above, so you have a simulated sample of monthly industry returns.

  2. use the simulated return data and the weights in mve_data column of the dataframe Weights to construct a time-series of portfolio excess return.

  3. compute and print its Sharpe Ratio.

Hint:

Every time you run this cell you should get a different Sharpe Ratio. This variation reflects the amount of overall uncertainty built in our investment strategy.

# your code below

13. Monte Carlo, part 4

Now copy the code of the question above and write a foor loop around it.

Do the followings:

  1. loop throught this code 1000 times and each time record the resulting Sharpe Ratio.

  2. cave this in a dataframe called MC.

  3. create a histogram of these Sharpe Ratios with 50 bins using the method .hist

Discuss the plot:

  1. what do you conclude?

# your code below
# your discussion below

14. Bootstrap

The Monte-Carlo approach assumes that the distribution of returns is normal which is a good approximation but not literally true. It turns out that we can use another approach called Bootstrap that instead of sampling from the normal distribution, we sample from the actual data.

Basically, bootstrap approach randomly draws one observation (one month of 49 industy returns in our case) from the real dataset, treats it as a new observation in the simulated sample and repeats this process until the simulated sample has needed sample size. Observations are drawn with replacement, which means we draw from the whole real dataset every time so it is very likely that some data points are drawn more than once.

Do the followings:

  1. start with your code from question 13,

  2. instead of calling np.random.multivariate_normal to draw a sample, we will use the method sample

  3. now you can simply plug this realization in your code from question 13.

  4. save the results in a dataframe called Boot.

  5. create a histogram of these Sharpe Ratios with 50 bins using the method .hist

Compare the results in 13 and 14, and explain:

  1. does the key takeway change?

  2. does the distribution of the SR of our strategy change?

# your code below
# your discussion below

15. Please explain why an investor should care about these results.

# your answer below