Assignment 1

16.6. Assignment 1#

16.7. Write your name and simon email#

Please write names below

[Name, student ID, email]:
[Name, student ID, email]:
[Name, student ID, email]:

16.8. Exercises#

Exercise 1

Start by importing pandas, numpy, maplotlib, and loading the data set.

The dataset has address

url='https://github.com/amoreira2/Fin418/blob/main/assets/data/Assignment1.xlsx?raw=true'

I strongly recommend you download first and look at the data set.

This file contains multiple sheets, you should use read_excel to get the data that contains the 49 value-weighted industry portfolios.

See here:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html .

Do the followings:

Import this dataframe as df_ind
- Use “sheet_name” to select the desired excel sheet.
- Use “skip_rows” to skip the initial rows before the data.
- Figure out what is the code for missing value and change the option na_values appropriately
- If you look at the excel file you will see that there are other data sets stacked horizontally. Use the usecols option to select the range of columns you want imported
- You can take a look at Lab3 material for some examples of importing excel data and using these options.
Change the name of the column with the date information to date
Use to_datetime so python understand the column date as a datetime object (you will have to use the option format)
Set date as index
Call df_ind.info() so you check all the tasks were accomplished.
In the next cell, call df_ind.head()

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
from pandas.tseries.offsets import MonthEnd

url = 'https://github.com/amoreira2/Fin418/blob/main/assets/data/Assignment1.xlsx?raw=true'

# Import the data
df_ind = pd.read_excel(
    url,
    sheet_name='49_Industry_Portfolios',
    skiprows=6,
    na_values=['-99.99'],
    usecols="A:AX"
)

# Rename the column with date information
df_ind.rename(columns={df_ind.columns[0]: 'date'}, inplace=True)

# Convert the date column to datetime
df_ind['date'] = pd.to_datetime(df_ind['date'], format='%Y%m')

# Set date as the index
df_ind.set_index('date', inplace=True)

# Check the dataframe
df_ind.info()
# your code below

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1069 entries, 1926-07-01 to 2015-07-01
Data columns (total 49 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 Agric   1069 non-null   float64
 Food    1069 non-null   float64
 Soda    625 non-null    float64
 Beer    1069 non-null   float64
 Smoke   1069 non-null   float64
 Toys    1069 non-null   float64
 Fun     1069 non-null   float64
 Books   1069 non-null   float64
 Hshld   1069 non-null   float64
 Clths   1069 non-null   float64
Hlth    553 non-null    float64
MedEq   1069 non-null   float64
Drugs   1069 non-null   float64
Chems   1069 non-null   float64
Rubbr   1009 non-null   float64
Txtls   1069 non-null   float64
BldMt   1069 non-null   float64
Cnstr   1069 non-null   float64
Steel   1069 non-null   float64
FabPr   625 non-null    float64
Mach    1069 non-null   float64
ElcEq   1069 non-null   float64
Autos   1069 non-null   float64
Aero    1069 non-null   float64
Ships   1069 non-null   float64
Guns    625 non-null    float64
Gold    625 non-null    float64
Mines   1069 non-null   float64
Coal    1069 non-null   float64
Oil     1069 non-null   float64
Util    1069 non-null   float64
Telcm   1069 non-null   float64
PerSv   1057 non-null   float64
BusSv   1069 non-null   float64
Hardw   1069 non-null   float64
Softw   601 non-null    float64
Chips   1069 non-null   float64
LabEq   1069 non-null   float64
Paper   1024 non-null   float64
Boxes   1069 non-null   float64
Trans   1069 non-null   float64
Whlsl   1069 non-null   float64
Rtail   1069 non-null   float64
Meals   1069 non-null   float64
Banks   1069 non-null   float64
Insur   1069 non-null   float64
RlEst   1069 non-null   float64
Fin     1069 non-null   float64
Other   1069 non-null   float64
dtypes: float64(49)
memory usage: 417.6 KB

df_ind

	Agric	Food	Soda	Beer	Smoke	Toys	Fun	Books	Hshld	Clths	...	Boxes	Trans	Whlsl	Rtail	Meals	Banks	Insur	RlEst	Fin	Other
date
1926-07-01	2.37	0.12	NaN	-5.19	1.29	8.65	2.50	50.21	-0.48	8.08	...	7.70	1.94	-23.79	0.07	1.87	4.61	-0.54	2.89	-4.85	5.20
1926-08-01	2.23	2.68	NaN	27.03	6.50	16.81	-0.76	42.98	-3.58	-2.51	...	-2.38	4.88	5.39	-0.75	-0.13	11.83	2.57	5.30	-0.57	6.76
1926-09-01	-0.57	1.58	NaN	4.02	1.26	8.33	6.42	-4.91	0.73	-0.51	...	-5.54	0.06	-7.87	0.25	-0.56	-1.75	0.72	-3.06	-3.14	-3.86
1926-10-01	-0.46	-3.68	NaN	-3.31	1.06	-1.40	-5.09	5.37	-4.68	0.12	...	-5.08	-2.64	-15.38	-2.20	-4.11	-11.82	-4.28	-5.74	2.07	-8.49
1926-11-01	6.75	6.26	NaN	7.29	4.55	0.00	1.82	-6.40	-0.54	1.87	...	3.84	1.60	4.67	6.52	4.33	-2.97	3.58	2.21	4.92	4.00
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2015-03-01	-5.28	2.47	-4.64	-2.07	-8.82	-4.11	-2.35	0.48	-1.98	1.23	...	-3.25	-3.62	0.61	0.99	-0.29	-0.83	2.28	3.91	-0.57	-2.13
2015-04-01	1.07	-0.23	-0.43	-0.52	5.94	9.25	2.62	-4.07	-2.41	-1.53	...	-1.72	-1.14	-1.20	-2.88	0.51	2.13	-1.78	-2.72	0.48	0.34
2015-05-01	3.43	2.82	0.20	2.00	1.28	-3.63	1.68	2.02	0.39	0.04	...	0.33	-3.06	1.42	0.53	1.56	2.61	4.64	0.68	2.07	0.84
2015-06-01	-7.59	-0.79	-2.34	-1.71	-2.57	0.46	1.42	0.20	-0.11	4.17	...	-3.34	-3.25	-2.80	-0.48	0.29	1.67	2.17	-2.98	-0.58	-3.84
2015-07-01	-2.75	3.55	6.68	3.51	9.59	-3.85	8.00	-2.99	0.97	5.97	...	-0.90	3.27	-1.30	5.78	4.15	2.67	1.66	1.55	0.25	2.85

1069 rows × 49 columns

Exercise 2. Advanced date manipulation

convert the date from the start of the month to end of the month.
call df_ind.head() and verify it works

Hint:

Read this link: https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-of-a-pandas-dataframe-series. If you google “pandas end of month” that is the first thing that comes out. Read the answer and apply to your problem.
you aready set date as index, so you cannot do stuff like df_ind.date or df_ind['date'] and have to adjust the code accordingly. Think about how to access the index.

# your code below

df_ind.head()

Exercise 3. Importing risk-free rate

In this same file there is another sheet with market returns and the risk-free rate. Import them as df_rmrf by following all the steps you did in the above two questions
Call df_rmrf.info() so you check all the tasks were accomplished.
In the next cell, call df_rmrf.head()

# your code below

Exercise 4. Constructing excess returns A

for the industry Agric, construct the excess return by subtracting the risk-free rate RF from it.
compute the mean of this excess return.
print it along with the mean of the raw returns and the risk free rate to compare

# your code below

Exercise 5. Constructing excess returns B

construct excess returns for all portfolio by subtracting the risk-free rate from all of columns at the same time
name the new data frame df_inde ( for excess returns)

Hint:

You can do that using the method .subtract() with the option axis to tell along which dimension
Go ahead , google “pandas subtract” to see how this works

# your code below 

df_inde.head()

Exercise 6. Drop missing observations

You may notice that excess returns of some industries are not available at the beginning of the sample.

If we want all the industries to have same period of data in df_inde, we need to drop some observations.

Do the followings:

Use method dropna to drop rows in df_inde if ANY industry is missing.
After that, print(df_inde.shape) to see the changes in the length.

Hint

when you call dropna function, use axis and how option to drop missing values if ANY industry is missing

# your code below

Exercise 7. Moments

We will now estimate the risk-premium in each of these portfolio and the covariance between these portfolios.

Do the followings:

using the method mean on the excess return data frame to obtain a vector of average excess returns.
Using std construct an estimator for each asset standard deviation.
use cov method to estimate the covariance of excess returns.
Discuss in each units these variables are

# your code below

ERe.head()

CovRe.head()

Exercise 8. Plotting

Choose a couple of industry portfolios to plot their time-series.

Use the behavior of these two assets to discuss an important economic event in US history

as you discuss make sure to use the magnitudes in your discussion to show that you do understand what this data means

# your code below

Exercise 9. Cumulative returns

Choose two industry portfolios to plot the cumulative returns over time.

You can plot for the whole period or just a subperiod.

You should explain what the numbers mean in terms of how much money people would have if they had invested in these assets

Assignment 1

Contents

16.6. Assignment 1#

16.7. Write your name and simon email#

16.8. Exercises#