16.6. Assignment 1#

16.7. Write your name and simon email#

Please write names below

Please write names below

  1. [Name, student ID, email]:

  2. [Name, student ID, email]:

  3. [Name, student ID, email]:

16.8. Exercises#

Exercise 1

Start by importing pandas, numpy, maplotlib, and loading the data set.

The dataset has address

url='https://github.com/amoreira2/Fin418/blob/main/assets/data/Assignment1.xlsx?raw=true'

I strongly recommend you download first and look at the data set.

This file contains multiple sheets, you should use read_excel to get the data that contains the 49 value-weighted industry portfolios.

See here:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html .

Do the followings:

  1. Import this dataframe as df_ind

    • Use “sheet_name” to select the desired excel sheet.

    • Use “skip_rows” to skip the initial rows before the data.

    • Figure out what is the code for missing value and change the option na_values appropriately

    • If you look at the excel file you will see that there are other data sets stacked horizontally. Use the usecols option to select the range of columns you want imported

    • You can take a look at Lab3 material for some examples of importing excel data and using these options.

  2. Change the name of the column with the date information to date

  3. Use to_datetime so python understand the column date as a datetime object (you will have to use the option format)

  4. Set date as index

  5. Call df_ind.info() so you check all the tasks were accomplished.

  6. In the next cell, call df_ind.head()

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
from pandas.tseries.offsets import MonthEnd

url = 'https://github.com/amoreira2/Fin418/blob/main/assets/data/Assignment1.xlsx?raw=true'

# Import the data
df_ind = pd.read_excel(
    url,
    sheet_name='49_Industry_Portfolios',
    skiprows=6,
    na_values=['-99.99'],
    usecols="A:AX"
)

# Rename the column with date information
df_ind.rename(columns={df_ind.columns[0]: 'date'}, inplace=True)

# Convert the date column to datetime
df_ind['date'] = pd.to_datetime(df_ind['date'], format='%Y%m')

# Set date as the index
df_ind.set_index('date', inplace=True)

# Check the dataframe
df_ind.info()
# your code below
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1069 entries, 1926-07-01 to 2015-07-01
Data columns (total 49 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Agric   1069 non-null   float64
 1   Food    1069 non-null   float64
 2   Soda    625 non-null    float64
 3   Beer    1069 non-null   float64
 4   Smoke   1069 non-null   float64
 5   Toys    1069 non-null   float64
 6   Fun     1069 non-null   float64
 7   Books   1069 non-null   float64
 8   Hshld   1069 non-null   float64
 9   Clths   1069 non-null   float64
 10  Hlth    553 non-null    float64
 11  MedEq   1069 non-null   float64
 12  Drugs   1069 non-null   float64
 13  Chems   1069 non-null   float64
 14  Rubbr   1009 non-null   float64
 15  Txtls   1069 non-null   float64
 16  BldMt   1069 non-null   float64
 17  Cnstr   1069 non-null   float64
 18  Steel   1069 non-null   float64
 19  FabPr   625 non-null    float64
 20  Mach    1069 non-null   float64
 21  ElcEq   1069 non-null   float64
 22  Autos   1069 non-null   float64
 23  Aero    1069 non-null   float64
 24  Ships   1069 non-null   float64
 25  Guns    625 non-null    float64
 26  Gold    625 non-null    float64
 27  Mines   1069 non-null   float64
 28  Coal    1069 non-null   float64
 29  Oil     1069 non-null   float64
 30  Util    1069 non-null   float64
 31  Telcm   1069 non-null   float64
 32  PerSv   1057 non-null   float64
 33  BusSv   1069 non-null   float64
 34  Hardw   1069 non-null   float64
 35  Softw   601 non-null    float64
 36  Chips   1069 non-null   float64
 37  LabEq   1069 non-null   float64
 38  Paper   1024 non-null   float64
 39  Boxes   1069 non-null   float64
 40  Trans   1069 non-null   float64
 41  Whlsl   1069 non-null   float64
 42  Rtail   1069 non-null   float64
 43  Meals   1069 non-null   float64
 44  Banks   1069 non-null   float64
 45  Insur   1069 non-null   float64
 46  RlEst   1069 non-null   float64
 47  Fin     1069 non-null   float64
 48  Other   1069 non-null   float64
dtypes: float64(49)
memory usage: 417.6 KB
df_ind
Agric Food Soda Beer Smoke Toys Fun Books Hshld Clths ... Boxes Trans Whlsl Rtail Meals Banks Insur RlEst Fin Other
date
1926-07-01 2.37 0.12 NaN -5.19 1.29 8.65 2.50 50.21 -0.48 8.08 ... 7.70 1.94 -23.79 0.07 1.87 4.61 -0.54 2.89 -4.85 5.20
1926-08-01 2.23 2.68 NaN 27.03 6.50 16.81 -0.76 42.98 -3.58 -2.51 ... -2.38 4.88 5.39 -0.75 -0.13 11.83 2.57 5.30 -0.57 6.76
1926-09-01 -0.57 1.58 NaN 4.02 1.26 8.33 6.42 -4.91 0.73 -0.51 ... -5.54 0.06 -7.87 0.25 -0.56 -1.75 0.72 -3.06 -3.14 -3.86
1926-10-01 -0.46 -3.68 NaN -3.31 1.06 -1.40 -5.09 5.37 -4.68 0.12 ... -5.08 -2.64 -15.38 -2.20 -4.11 -11.82 -4.28 -5.74 2.07 -8.49
1926-11-01 6.75 6.26 NaN 7.29 4.55 0.00 1.82 -6.40 -0.54 1.87 ... 3.84 1.60 4.67 6.52 4.33 -2.97 3.58 2.21 4.92 4.00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2015-03-01 -5.28 2.47 -4.64 -2.07 -8.82 -4.11 -2.35 0.48 -1.98 1.23 ... -3.25 -3.62 0.61 0.99 -0.29 -0.83 2.28 3.91 -0.57 -2.13
2015-04-01 1.07 -0.23 -0.43 -0.52 5.94 9.25 2.62 -4.07 -2.41 -1.53 ... -1.72 -1.14 -1.20 -2.88 0.51 2.13 -1.78 -2.72 0.48 0.34
2015-05-01 3.43 2.82 0.20 2.00 1.28 -3.63 1.68 2.02 0.39 0.04 ... 0.33 -3.06 1.42 0.53 1.56 2.61 4.64 0.68 2.07 0.84
2015-06-01 -7.59 -0.79 -2.34 -1.71 -2.57 0.46 1.42 0.20 -0.11 4.17 ... -3.34 -3.25 -2.80 -0.48 0.29 1.67 2.17 -2.98 -0.58 -3.84
2015-07-01 -2.75 3.55 6.68 3.51 9.59 -3.85 8.00 -2.99 0.97 5.97 ... -0.90 3.27 -1.30 5.78 4.15 2.67 1.66 1.55 0.25 2.85

1069 rows Ă— 49 columns

Exercise 2. Advanced date manipulation

  1. convert the date from the start of the month to end of the month.

  2. call df_ind.head() and verify it works

Hint:

# your code below

df_ind.head()

Exercise 3. Importing risk-free rate

  1. In this same file there is another sheet with market returns and the risk-free rate. Import them as df_rmrf by following all the steps you did in the above two questions

  2. Call df_rmrf.info() so you check all the tasks were accomplished.

  3. In the next cell, call df_rmrf.head()

# your code below

Exercise 4. Constructing excess returns A

  1. for the industry Agric, construct the excess return by subtracting the risk-free rate RF from it.

  2. compute the mean of this excess return.

  3. print it along with the mean of the raw returns and the risk free rate to compare

# your code below

Exercise 5. Constructing excess returns B

  1. construct excess returns for all portfolio by subtracting the risk-free rate from all of columns at the same time

  2. name the new data frame df_inde ( for excess returns)

Hint:

  • You can do that using the method .subtract() with the option axis to tell along which dimension

  • Go ahead , google “pandas subtract” to see how this works

# your code below 

df_inde.head()

Exercise 6. Drop missing observations

You may notice that excess returns of some industries are not available at the beginning of the sample.

If we want all the industries to have same period of data in df_inde, we need to drop some observations.

Do the followings:

  1. Use method dropna to drop rows in df_inde if ANY industry is missing.

  2. After that, print(df_inde.shape) to see the changes in the length.

Hint

  • when you call dropna function, use axis and how option to drop missing values if ANY industry is missing

# your code below

Exercise 7. Moments

We will now estimate the risk-premium in each of these portfolio and the covariance between these portfolios.

Do the followings:

  1. using the method mean on the excess return data frame to obtain a vector of average excess returns.

  2. Using std construct an estimator for each asset standard deviation.

  3. use cov method to estimate the covariance of excess returns.

  4. Discuss in each units these variables are

# your code below
ERe.head()
CovRe.head()

Exercise 8. Plotting

Choose a couple of industry portfolios to plot their time-series.

Use the behavior of these two assets to discuss an important economic event in US history

as you discuss make sure to use the magnitudes in your discussion to show that you do understand what this data means

# your code below

Exercise 9. Cumulative returns

Choose two industry portfolios to plot the cumulative returns over time.

You can plot for the whole period or just a subperiod.

You should explain what the numbers mean in terms of how much money people would have if they had invested in these assets