16.12. Assignment 3#

16.13. Write your name and simon email#

Please write names below

  1. [Name, student ID, email]:

  2. [Name, student ID, email]:

  3. [Name, student ID, email]:

16.14. Exercises#

1. Data Cleaning

You will work with the same dataset as in Assigment 1. The dataset has address

url='https://github.com/amoreira2/Fin418/blob/main/assets/data/Assignment1.xlsx?raw=true'

Do the followings:

  • Import pandas, numpy, matplotlib, and load the data set.

  • Import the datasets of industry returns and risk free rate.

  • Parse the date.

  • Set the index.

  • Drop missing observations.

  • Construct a dataframe with only excess returns.

  • Call this dataframe with the 49 excess returns time series df.

  • Call df.head() to check if everything works

Hint: You did it in assignment 1 (the dataframe name was different), simply copy and paste your code.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
from pandas.tseries.offsets import MonthEnd

# your code below

df.head()

2. Expected excess return estimation

Compute the sample mean as the estimators for the expected excess returns of the 49 assets.

Call this ERe.

Hint: Use risk-free rates from the “Market_proxy” sheet.

# your code below

ERe.head()

3. Expected excess return uncertainty

We will now construct an estimator for the amount of uncertainty in our sample mean estimator. If we assume that each individual asset is uncorrelated over time (not terrible assumption), then the variance of the mean is

\[var(\bar{r_i}) = var(\frac{\sum_t^T r_{i,t}}{T})=\frac{\sum_t^T var( r_{i,t})}{T^2} = \frac{var(r_{i,t})}{T}\]

So all you need is the sample size (T) and the variance of each asset to obtain the varaince of our estimator.

Please use this formula to compute the STANDARD DEVIATION of sample average estimator of each 49 asset. Call this ERe_se.

# your code below

ERe_se.head()

4. Constructing a confidence interval for the expected excess return, part 1

We will now want to construct the 95% confidence interval for our estimator. The interval is such that it contains the true mean 95% of the time.

The way to do this is to use the normal distribution CDF to figure out the threshold that leaves only 2.5% probability at the each side of the tails.

Why 2.5% and not 5%? Because it is symmetric, so there is 2.5% probability in the left tail and 2.5 % in the right tail so overall there is only 5% probability that the expected return is out of the interval. Thus there is 95% probability that it is in the interval.

In this exercise, you will find the threshold by doing the followings:

  1. import the stats library from the scipy package with from scipy import stats

  2. get the standard normal distribution with sn=stats.norm(0,1), where 0 is the mean and 1 is the standard deviation

  3. get the threshold by using inverse cumulative distribution function for the appropriate prob_value to create a 95% CI (see discussion above). threshold=sn.isf(prob_value)

  4. make sure that this threshold is positive (if you got from the left tail, you will have to take the absolute value or just get from the right tail).

  5. make sure you did things correctly by calling print(threshold).

Hint:

You can always check in the normal table that you did things correctly https://en.wikipedia.org/wiki/Normal_distribution.

# your code below

print(threshold)

5. Constructing a confidence interval for the expected excess return, part 2

Armed with these threshold you can construct the interval as follows

\[[\bar{r}-threshold\times\sigma(\bar{r}),\bar{r}+threshold\times\sigma(\bar{r})]\]

Do the followings:

  1. create an empty dataframe which has the names of industries as index and ‘lower’ and ‘upper’ as column names. Name it ERe_ci.

  2. construct the lower bound of the interval, \(\bar{r}-threshold\times\sigma(\bar{r})\), and store it in the column of ‘lower’

  3. compute the upper bound symmetrically, and store it in the column of ‘upper’

  4. call ERe_ci.head()

# your code below

ERe_ci.head()

6. Compute the tangency portfolio weights for a portfolio with annualized volatility of 10%

A portfolio that is very important is the Mean-Variance-Efficient portfolio–sometimes also called the Tangency portfolio. This portfolio plays an outsized role in finance theory–for example any model of asset prices can be characterized as a prediction for which portfolio is MVE–for example in the case of the CAPM it predicts the market portfolio to be MVE.

For us we will simply discuss it’s empirical implementation. It turns out that if you know the true moments of the excess returns distributions of the assets, \(E[R^e]\) and \(Var(R^e)\), then the portfolio that maximizes the SR is any portfolio weight proportional to

\[Var(R^e)^{-1}E[R^e]\]

So for any number x, the portfolio \(W=xVar(R^e)^{-1}E[R^e]\) has the highest feasible sharpe ratio.

Let’s construct this portfolio using the sample moments, so it will return the static portfolio that had the highest sharpe ratio in the sample.

  1. you already estimated E[Re]

  2. Now use .cov() to estimate the variance-covariance matrix and save it as CovRe

  3. Invert the variance covariance matrix usign np.linalg.inv()

  4. Construct the weights

  5. Choose x so that the weights add up to 1

Store these in a dataframe whose rows are the names of the assets and the first column has the label ‘mve_data’.

Name this data frame Weights.

print(Weights)

# your code below

# Your CovRe
CovRe = 

# Plug in appropriate variables in `invCovRe` and 'ERe'
raw_weights = 'invCovRe' @ 'ERe'

# Normalize weights to sum to 1
x = 
norm_weights = x * raw_weights

# Store in DataFrame
Weights = 

print(Weights.head())

7. Sensitivity to uncertainty of the Tangent portfolio calculation

Now we will compute the tangency portfolio but using a slightly different estimate for the mean.

Do the followings.

  1. instead of using the sample mean for each asset, first we will pick one asset, Hshld.

  2. change its mean to be its lower bound of CI (confidence interval) in Exercise 5 and then recalculate the tangency portfolio weights. (That is, change the mean of Hshld ONLY and go through the same process as exercise 6 to recalculate the portfolio weights for ALL assets.)

  3. store this in dataframe Weights with the column name mve_Hlth-1.95.

  4. create another column with weights computed from the perturbation in which the mean of Hshld is changed to be the upper bound of CI, label this column mve_Hshld+1.95 .

  5. do a bar plot of these three sets of weights using Weights.plot.bar().

Discuss what you notice in the bar plot:

  1. How much do the weights change?

  2. Which assets are impacted? Why?

Hint:

you might want to create a copy of your ERe estimator before you do the perturbation.

# your code below

# Make copies of `ERe` dataframe and name them `ERe_lower` and 'ERe_upper'
ERe_lower = ERe.copy()
ERe_upper = ERe.copy()

# Replace Hshld's mean with lower and upper CI bounds
ERe_lower['Hshld'] = 
ERe_upper['Hshld'] = 

# Compute weights as in the above exercise and save the values of each in `Weights` dataframe

# Bar plot

your discussion below

  1. How much do the weights change?

  2. Which assets are impacted? Why?

8. Performance impact of estimation uncertainty

Your Weight dataframe has 3 different weight schemes (three columns).

Do the followings:

  1. compute the in-sample Sharpe Ratio for these 3 different weight schemes.

  2. discuss the results that you obtained

Hint: You should use the real data (e.g. df,ERe,CovRe ) to compute the Sharpe ratio (not the perturbed means in exercise 7).

# your code below

your discussion below

9. Reproduce analysis of Exercises 7-8 for all assets

Do the followings:

  1. use a for loop to loop through the 49 portfolios and create the “perturbed” weights and the Sharpe Ratio of the perturbed weights by performing the same exercises as in exercise 7 and 8.

  2. record for each asset the average drop in the Sharpe Ratio associated with the perturbation in the tangency portfolio weights.

  3. store the results in a dataframe named dSR (difference in SR): $\(dSR[asset]=\frac{1}{2}\frac{SR(asset+1.95)-SR+SR(asset-1.95)-SR}{SR(data)}\)$ where SR(asset+1.95) and SR(asset-1.95) were the Sharpe ratios obtained when you perturb the expected excess return of that asset to the upper and lower bound of the CI. dSR should be a dataframe with industry names as index and a column, called SR_change, containing the results of the calculation from the expression above.

  4. do a bar plot of this Sharpe ratio change.

Discuss the bar plot:

  1. What do you think is the key takeaway from the analysis above

Hint: Note that all you need to do here is to get the code you developed above and adapt it to work with a for loop.

Hint: When saving into dSR, You can use .from_dict method of pd.DataFrame

# your code below
    
# It's useful to use dictionary for this exercise
sr_changes = {}

# For each asset, do the same exercise
for asset in ERe.index:
    # Write a code calculating Sharpe ratios using above exercise

    # Compute average SR drop. Hint: Use the formula in the problem.
    sr_diff = 
    sr_changes[asset] = sr_diff

# Save it into DataFrame
dSR = 

# Bar plot

your discussion below

10. Monte Carlo, part 1

So far our focus is on the estimation uncertainty of risk premiums. Covariance matrices also need to be estimated.

You will now implement a Monte-Carlo method to evaluate the overall uncertainty in the construction of the tangency portfolio.

You already have the sample estimates from the vector of expected excess returns ERe and the variance-covariance matrix CovRe (you used those in Exercise 6).

Now you use the function np.random.multivariate_normal to simulate draws from a multivariate normal distribution with vector of mean equal to ERe and the covariance matrix equal to CovRe.

Do the following:

  1. write the code that draws ONE realization of returns for this set of 49 assets.

Hint:

you should get a vector 49 by 1 that changes every time you run the cell.

type np.random.multivariate_normal? to see how this function works

# your code below

11. Monte Carlo, part 2

Do the followings:

  1. now set the parameter size in the multivaraite_normal function to draw T realizations of the 49 assets, where you set T to the number of months you have in the data set.

  2. print the shape of your draw. This should return you a T by N matrix of returns, something with exactly the same shape as our data set.

Hint: every time you run the cell again you get a different realization.

# your code below

12. Monte Carlo, part 3

Do the followings:

  1. copy the code above, so you have a simulated sample of monthly industry returns.

  2. use the simulated return data and the weights in mve_data column of the dataframe Weights to construct a time-series of portfolio excess return. That is use the weights you constructed using the sample moments of the original sample.

  3. compute and print its Sharpe Ratio.

Hint:

Every time you run this cell you should get a different Sharpe Ratio. This variation reflects the amount of overall uncertainty built in our investment strategy.

# your code below
simulated_returns = 
w_mve = Weights['mve_data'].values

# Compute time-series of simulated portfolio returns
simulated_port_returns = 

# Compute Sharpe Ratio of the simulated portfolio

13. Monte Carlo, part 4

Now copy the code of the question above and write a for loop around it.

Do the followings:

  1. loop throught this code 1000 times and each time record the resulting Sharpe Ratio.

  2. save this in a dataframe called MC.

  3. create a histogram of these Sharpe Ratios with 50 bins using the method .hist

Discuss the plot:

  1. what do you conclude?

# your code below

# Prepare saving space
mc_sharpes = []

# Run simulation 1000 times
for iter in range(1000):
    

your discussion below

14. Please explain why an investor should care about these results.

your answer below