16.6. Assignment 1#
16.7. Write your name and simon email#
Please write names below
Please write names below
[Name, student ID, email]:
[Name, student ID, email]:
[Name, student ID, email]:
16.8. Exercises#
Exercise 1
Start by importing pandas, numpy, maplotlib, and loading the data set.
The dataset has address
url='https://github.com/amoreira2/Fin418/blob/main/assets/data/Assignment1.xlsx?raw=true'
I strongly recommend you download first and look at the data set.
This file contains multiple sheets, you should use read_excel
to get the data that contains the 49 value-weighted industry portfolios.
See here:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html .
Do the followings:
Import this dataframe as
df_ind
Use “sheet_name” to select the desired excel sheet.
Use “skip_rows” to skip the initial rows before the data.
Figure out what is the code for missing value and change the option
na_values
appropriatelyIf you look at the excel file you will see that there are other data sets stacked horizontally. Use the
usecols
option to select the range of columns you want importedYou can take a look at Lab3 material for some examples of importing excel data and using these options.
Change the name of the column with the date information to date
Use
to_datetime
so python understand the column date as a datetime object (you will have to use the option format)Set date as index
Call
df_ind.info()
so you check all the tasks were accomplished.In the next cell, call
df_ind.head()
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.tseries.offsets import MonthEnd
url = 'https://github.com/amoreira2/Fin418/blob/main/assets/data/Assignment1.xlsx?raw=true'
# Import the data
df_ind = pd.read_excel(
url,
sheet_name='49_Industry_Portfolios',
skiprows=6,
na_values=['-99.99'],
usecols="A:AX"
)
# Rename the column with date information
df_ind.rename(columns={df_ind.columns[0]: 'date'}, inplace=True)
# Convert the date column to datetime
df_ind['date'] = pd.to_datetime(df_ind['date'], format='%Y%m')
# Set date as the index
df_ind.set_index('date', inplace=True)
# Check the dataframe
df_ind.info()
# your code below
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1069 entries, 1926-07-01 to 2015-07-01
Data columns (total 49 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Agric 1069 non-null float64
1 Food 1069 non-null float64
2 Soda 625 non-null float64
3 Beer 1069 non-null float64
4 Smoke 1069 non-null float64
5 Toys 1069 non-null float64
6 Fun 1069 non-null float64
7 Books 1069 non-null float64
8 Hshld 1069 non-null float64
9 Clths 1069 non-null float64
10 Hlth 553 non-null float64
11 MedEq 1069 non-null float64
12 Drugs 1069 non-null float64
13 Chems 1069 non-null float64
14 Rubbr 1009 non-null float64
15 Txtls 1069 non-null float64
16 BldMt 1069 non-null float64
17 Cnstr 1069 non-null float64
18 Steel 1069 non-null float64
19 FabPr 625 non-null float64
20 Mach 1069 non-null float64
21 ElcEq 1069 non-null float64
22 Autos 1069 non-null float64
23 Aero 1069 non-null float64
24 Ships 1069 non-null float64
25 Guns 625 non-null float64
26 Gold 625 non-null float64
27 Mines 1069 non-null float64
28 Coal 1069 non-null float64
29 Oil 1069 non-null float64
30 Util 1069 non-null float64
31 Telcm 1069 non-null float64
32 PerSv 1057 non-null float64
33 BusSv 1069 non-null float64
34 Hardw 1069 non-null float64
35 Softw 601 non-null float64
36 Chips 1069 non-null float64
37 LabEq 1069 non-null float64
38 Paper 1024 non-null float64
39 Boxes 1069 non-null float64
40 Trans 1069 non-null float64
41 Whlsl 1069 non-null float64
42 Rtail 1069 non-null float64
43 Meals 1069 non-null float64
44 Banks 1069 non-null float64
45 Insur 1069 non-null float64
46 RlEst 1069 non-null float64
47 Fin 1069 non-null float64
48 Other 1069 non-null float64
dtypes: float64(49)
memory usage: 417.6 KB
df_ind
Agric | Food | Soda | Beer | Smoke | Toys | Fun | Books | Hshld | Clths | ... | Boxes | Trans | Whlsl | Rtail | Meals | Banks | Insur | RlEst | Fin | Other | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||||||||||
1926-07-01 | 2.37 | 0.12 | NaN | -5.19 | 1.29 | 8.65 | 2.50 | 50.21 | -0.48 | 8.08 | ... | 7.70 | 1.94 | -23.79 | 0.07 | 1.87 | 4.61 | -0.54 | 2.89 | -4.85 | 5.20 |
1926-08-01 | 2.23 | 2.68 | NaN | 27.03 | 6.50 | 16.81 | -0.76 | 42.98 | -3.58 | -2.51 | ... | -2.38 | 4.88 | 5.39 | -0.75 | -0.13 | 11.83 | 2.57 | 5.30 | -0.57 | 6.76 |
1926-09-01 | -0.57 | 1.58 | NaN | 4.02 | 1.26 | 8.33 | 6.42 | -4.91 | 0.73 | -0.51 | ... | -5.54 | 0.06 | -7.87 | 0.25 | -0.56 | -1.75 | 0.72 | -3.06 | -3.14 | -3.86 |
1926-10-01 | -0.46 | -3.68 | NaN | -3.31 | 1.06 | -1.40 | -5.09 | 5.37 | -4.68 | 0.12 | ... | -5.08 | -2.64 | -15.38 | -2.20 | -4.11 | -11.82 | -4.28 | -5.74 | 2.07 | -8.49 |
1926-11-01 | 6.75 | 6.26 | NaN | 7.29 | 4.55 | 0.00 | 1.82 | -6.40 | -0.54 | 1.87 | ... | 3.84 | 1.60 | 4.67 | 6.52 | 4.33 | -2.97 | 3.58 | 2.21 | 4.92 | 4.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2015-03-01 | -5.28 | 2.47 | -4.64 | -2.07 | -8.82 | -4.11 | -2.35 | 0.48 | -1.98 | 1.23 | ... | -3.25 | -3.62 | 0.61 | 0.99 | -0.29 | -0.83 | 2.28 | 3.91 | -0.57 | -2.13 |
2015-04-01 | 1.07 | -0.23 | -0.43 | -0.52 | 5.94 | 9.25 | 2.62 | -4.07 | -2.41 | -1.53 | ... | -1.72 | -1.14 | -1.20 | -2.88 | 0.51 | 2.13 | -1.78 | -2.72 | 0.48 | 0.34 |
2015-05-01 | 3.43 | 2.82 | 0.20 | 2.00 | 1.28 | -3.63 | 1.68 | 2.02 | 0.39 | 0.04 | ... | 0.33 | -3.06 | 1.42 | 0.53 | 1.56 | 2.61 | 4.64 | 0.68 | 2.07 | 0.84 |
2015-06-01 | -7.59 | -0.79 | -2.34 | -1.71 | -2.57 | 0.46 | 1.42 | 0.20 | -0.11 | 4.17 | ... | -3.34 | -3.25 | -2.80 | -0.48 | 0.29 | 1.67 | 2.17 | -2.98 | -0.58 | -3.84 |
2015-07-01 | -2.75 | 3.55 | 6.68 | 3.51 | 9.59 | -3.85 | 8.00 | -2.99 | 0.97 | 5.97 | ... | -0.90 | 3.27 | -1.30 | 5.78 | 4.15 | 2.67 | 1.66 | 1.55 | 0.25 | 2.85 |
1069 rows Ă— 49 columns
Exercise 2. Advanced date manipulation
convert the date from the start of the month to end of the month.
call
df_ind.head()
and verify it works
Hint:
Read this link: https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-of-a-pandas-dataframe-series. If you google “pandas end of month” that is the first thing that comes out. Read the answer and apply to your problem.
you aready set date as index, so you cannot do stuff like
df_ind.date
ordf_ind['date']
and have to adjust the code accordingly. Think about how to access the index.
# your code below
df_ind.head()
Exercise 3. Importing risk-free rate
In this same file there is another sheet with market returns and the risk-free rate. Import them as
df_rmrf
by following all the steps you did in the above two questionsCall
df_rmrf.info()
so you check all the tasks were accomplished.In the next cell, call
df_rmrf.head()
# your code below
Exercise 4. Constructing excess returns A
for the industry
Agric
, construct the excess return by subtracting the risk-free rate RF from it.compute the mean of this excess return.
print it along with the mean of the raw returns and the risk free rate to compare
# your code below
Exercise 5. Constructing excess returns B
construct excess returns for all portfolio by subtracting the risk-free rate from all of columns at the same time
name the new data frame
df_inde
( for excess returns)
Hint:
You can do that using the method
.subtract()
with the option axis to tell along which dimensionGo ahead , google “pandas subtract” to see how this works
# your code below
df_inde.head()
Exercise 6. Drop missing observations
You may notice that excess returns of some industries are not available at the beginning of the sample.
If we want all the industries to have same period of data in df_inde
, we need to drop some observations.
Do the followings:
Use method
dropna
to drop rows indf_inde
if ANY industry is missing.After that,
print(df_inde.shape)
to see the changes in the length.
Hint
when you call
dropna
function, useaxis
andhow
option to drop missing values if ANY industry is missing
# your code below
Exercise 7. Moments
We will now estimate the risk-premium in each of these portfolio and the covariance between these portfolios.
Do the followings:
using the method
mean
on the excess return data frame to obtain a vector of average excess returns.Using
std
construct an estimator for each asset standard deviation.use
cov
method to estimate the covariance of excess returns.Discuss in each units these variables are
# your code below
ERe.head()
CovRe.head()
Exercise 8. Plotting
Choose a couple of industry portfolios to plot their time-series.
Use the behavior of these two assets to discuss an important economic event in US history
as you discuss make sure to use the magnitudes in your discussion to show that you do understand what this data means
# your code below
Exercise 9. Cumulative returns
Choose two industry portfolios to plot the cumulative returns over time.
You can plot for the whole period or just a subperiod.
You should explain what the numbers mean in terms of how much money people would have if they had invested in these assets