import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt

14.4. Multi-factor models#

Adding factors to proxy for the total wealth portfolio

According to the CAPM the Wealth portfolio is the tangency portfolio
The return on the wealth portfolio summarizes how Bad the times are
Assets that co-vary more with the market must earn higher returns
They pay poorly exactly when things are worse
What is the wealth portfolio? It is the portfolio that holds all assets in proportion to their value
We often use the equity market as a proxy for this, but that is not the CAPM
In a way that CAPM is untestable because it is impossible to construct the true wealth portfolio

E [R_{t}^{i} - R_{t}^{f}] = β_{i, m k t} E [R_{t}^{m k t} - R_{t}^{f}]

The fact that we cannot really measure the returns of the total wealth protfolio gives us a reason to add other facts in additons to the returns to the equity market. Under this logic you add returns to other assets classes>
- Real estate
- Corporate debt
- Human capital
- Private equity
- Venture Capital -…

So you get:

E [R_{t}^{i} - R_{t}^{f}] = β_{i, m k t} E [R_{t}^{m k t} - R_{t}^{f}] + \sum_{a}^{A} β_{i, a} E [R_{t}^{a} - R_{t}^{f}]

where $a$ are the different asset classes.
Sometimes people refer to these other hard to trade assets as “background risks”, basically risks that are you are exposed to that are not capture by your financial assets

ICAPM: Intertemporal Capital Asset Pricing Model

Another way to rationalize additional factors is to realize that the premium and the volatiltiy of the market are time-varying.
This means that assets that allow you to hedge this investment opportunity set risk will be valuable
- For example, an asset that pays well when the market is more/less volatile or one that pays well when the expected reutrns of market going froward are higher/lower
Under this logic, any strategy that is related to the market moments can also explain average returns on top of the marker beta
Sometimes people refer to the background risks discussed earlier in this framework as well. But in this case it is not about the moments of the market portfolio per se, but about correlation of the strategy with the background risks investors bear
The end result is the same

E [R_{t}^{i} - R_{t}^{f}] = β_{i, m k t} E [R_{t}^{m k t} - R_{t}^{f}] + \sum_{j = 1}^{J} β_{i, j} E [F a c t o r_{t}^{j}]

but now these are “Factors” and do no need to be components of the total wealth
but these factors must still be excess returns, i.e. they must be strategies based on financial assets that have price zero

APT: Arbitrage Pricing Theory

Yet another model that has exaclty the same prediction

E [R_{t}^{i} - R_{t}^{f}] = β_{i, m k t} E [R_{t}^{m k t} - R_{t}^{f}] + \sum_{j = 1}^{J} β_{i, j} E [F a c t o r_{t}^{j}]

but now the factors are motivated because the explain time-seires variation in realized return
the idea is that if you have enough factors you explain enough of the variaiton in the returns of an asset that if the betas do no explain the average reuturns than an arbitrage opportunity wouldd open up
So you can think of the first two moticaiton being more about “economic factors” while this one being abotu “fianncail factors”. The factors belong if they help you explain reuturs.
APT in the end is a very statistical model

How to work with multifactor models?

These models make less restrictive assumptions and predict that we will need additional factors beyond the market to “span” the tangency portfolio
What does “span” mean?

E [R_{t}^{i} - R_{t}^{f}] = β_{i, m k t} E [R_{t}^{m k t} - R_{t}^{f}] + \sum_{j = 1}^{J} β_{i, j} E [F a c t o r_{t}^{j}]

It is basically the same as before, but now we run a multivariate regression

$R_{i, t} = α_{i} + β_{m k t} R_{t}^{m k t} + β_{1} F a c t o r_{1} + β_{2} F a c t o r_{2} + β_{3} F a c t o r_{3} + . . . . + β_{n} F a c t o r_{n} + ϵ_{i, t}$
As long as the factors are also excess returns, the time-series alpha test is still valid and indentical as before.
You can do single asset tests and simply use the alpha t-stat to test the model
Everything that we discussed with the hedging/tracking portfolios apply here as well, but now

H e d g e d_{i, t} = R_{i, t} - (β_{m k t} R_{t}^{m k t} + β_{1} F a c t o r_{1} + β_{2} F a c t o r_{2} + β_{3} F a c t o r_{3} + . . . . + β_{n} F a c t o r_{n})

where $(β_{m k t} R_{t}^{m k t} + β_{1} F a c t o r_{1} + β_{2} F a c t o r_{2} + β_{3} F a c t o r_{3} + . . . . + β_{n} F a c t o r_{n})$ is the tracking portfolio

Which Multi-factor models?

It really depends on the application. Practicioners ofen use factor models with 10 or more factors becasue they are interested in peformance attibution.

You use the regression to see which factor explain some particular realized performance while not thinking to much about whether the factor there explains average returns.

This is a perferctly good use mulitfactor models. In this case you are not really thinking whether the factors help you span the tangency portfolio, but only if it help explain the realized reutrn of the strategy.

If you interesting in accounting for a fund/strategy/stock averagare return you have look for factors that help you span the tangency portfolio, i.e. facots that generate high sharpe ratios.

In the academic world and also in a good chunk of practicioners people use the following factor models

Fama French 3 factor model:

E [R_{t}^{i} - R_{t}^{f}] = β_{i, m k t} E [R_{t}^{m k t} - R_{t}^{f}] + β_{i, s m b} E [S m B_{t}] + β_{i, H m L} E [H m L_{t}]

Carhart 4 factor model:

E [R_{t}^{i} - R_{t}^{f}] = β_{i, m k t} E [R_{t}^{m k t} - R_{t}^{f}] + β_{i, s m b} E [S m B_{t}] + β_{i, H m L} E [H m L_{t}] + β_{i, W m L} E [W m L_{t}]

Fama French 5 factor model:

E [R_{t}^{i} - R_{t}^{f}] = β_{i, m k t} E [R_{t}^{m k t} - R_{t}^{f}] + β_{i, s m b} E [S m B_{t}] + β_{i, H m L} E [H m L_{t}] + β_{i, r m w} E [R m W_{t}] + β_{i, c m a} E [C m A_{t}]

Fama French 5 factor model + Momentum:

E [R_{t}^{i} - R_{t}^{f}] = β_{i, m k t} E [R_{t}^{m k t} - R_{t}^{f}] + β_{i, s m b} E [S m B_{t}] + β_{i, H m L} E [H m L_{t}] + β_{i, r m w} E [R m W_{t}] + β_{i, c m a} E [C m A_{t}] + β_{i, W m L} E [W m L_{t}]

Where :

HmL is the value strategy that buys high book to market firms and sell low book to market firms
SmB is a size strategy that buys firms with low market capitalization and sell firms with high market capitalizations
WmL is the momentum strategy that buy stocks that did well in the last 12 months and short the ones that did poorly
RmW is the strategy that buys firms with high gross proftiability and sell firms with low gross profitability
CmA is the strategy that buys firms that are investing little (low CAPEX) and sell firms that are investing a lot (high CAPEX)

Three applications to evalauting asset managers

We will do this in three steps

Get time-series of the factors in our factor model and the risk-free rate
For each application get the time-series return data of the asset being evaluated
Merge with factor series
Convert the returns of the asset being evaluate to excess returns
Run a time-series regression of the asset excess returns on the relevant factors

Factors for our factor model

We will start by dowloading he Five factors from Fama Frency 5 factors model plus the momentum factor which is in separate data set
We of course could have constructed them ourselves as well, but this is obviously faster and less error prone

from datetime import datetime
import pandas_datareader.data as web
start = datetime(1926, 1, 1)
ds = web.DataReader('F-F_Research_Data_5_Factors_2x3', 'famafrench',start=start)
dff5=ds[0][:'2021-3']
dff5
ds = web.DataReader('F-F_Momentum_Factor', 'famafrench',start=start)
dmom=ds[0][:'2021-3']
dmom
dff6=dff5.merge(dmom,left_index=True,right_index=True,how='inner')
dff6

	Mkt-RF	SMB	HML	RMW	CMA	RF	Mom
Date
1963-07	-0.39	-0.45	-0.94	0.66	-1.15	0.27	1.00
1963-08	5.07	-0.82	1.82	0.40	-0.40	0.25	1.03
1963-09	-1.57	-0.48	0.17	-0.76	0.24	0.27	0.16
1963-10	2.53	-1.30	-0.04	2.75	-2.24	0.29	3.14
1963-11	-0.85	-0.85	1.70	-0.45	2.22	0.27	-0.75
...	...	...	...	...	...	...	...
2020-11	12.47	6.75	2.11	-2.78	1.05	0.01	-12.25
2020-12	4.63	4.67	-1.36	-2.15	0.00	0.01	-2.42
2021-01	-0.03	6.88	2.85	-3.33	4.68	0.00	4.37
2021-02	2.78	4.51	7.08	0.09	-1.97	0.00	-7.68
2021-03	3.09	-0.98	7.40	6.43	3.44	0.00	-5.83

693 rows × 7 columns

Aplication 1: Performance evaluation of Berkshire Hathaway

How much of WB performance can be explained by his exposure to the differnet risk factors?
How much of WB performance is due to his ability to buy good business for cheap?

Getting the Data

We will do the data at monthly frequency.

While daily data is helpful in estiamting betas we have do be mindful of liqudiity effects, for example bid-ask bounces

One could do daily and add lagged returns as a way to deal with this

import wrds
import psycopg2 
from codesnippets import get_returns
conn=wrds.Connection()

def get_returns(tickers,conn,startdate,enddate,variable='ret'):
    ticker=tickers[0]
    df = conn.raw_sql("""
                          select a.permno, b.ncusip, b.ticker, a.date, b.shrcd,
                          a.ret, a.vol, a.shrout, a.prc, a.cfacpr, a.cfacshr,a.retx
                          from crsp.msf as a
                          left join crsp.msenames as b
                          on a.permno=b.permno
                          and b.namedt<=a.date
                          and a.date<=b.nameendt
                          where a.date between '"""+ startdate+"""'  and '"""+ enddate+            
                          """' and  b.ticker='"""+ticker+"'")
    # and b.shrcd between 10 and 11
    # and b.exchcd between 1 and 3
            
    df.set_index(['date','permno'],inplace=True)
    df=df[variable].unstack()
    return df

Enter your WRDS username [Alan.Moreira]: moreira5
Enter your password: ···········

WRDS recommends setting up a .pgpass file.
You can find more info here:
https://www.postgresql.org/docs/9.5/static/libpq-pgpass.html.
Loading library list...
Done

df=get_returns(['BRK'],conn,'1/1/1960','1/1/2022',variable='prc')
df.abs().plot()

<AxesSubplot:xlabel='date'>

Berkshire seems to have multiple securities issue over time

We will simply use the blue series which matches wha tI see in google, but you could think about piecing toghether the blue and the green to expand the sample. But one would have to read in detail to see what happened and if that really makes sense

df=get_returns(['BRK'],conn,'1/1/1960','1/1/2022',variable='ret')
(df[17778.0]+1).cumprod().plot()

<AxesSubplot:xlabel='date'>

Lets now merge with the factors

Firs note that they are both monthy bu the factors index is period type while the Berkshire index is date time

dff6.index

PeriodIndex(['1963-07', '1963-08', '1963-09', '1963-10', '1963-11', '1963-12',
             '1964-01', '1964-02', '1964-03', '1964-04',
             ...
             '2020-06', '2020-07', '2020-08', '2020-09', '2020-10', '2020-11',
             '2020-12', '2021-01', '2021-02', '2021-03'],
            dtype='period[M]', name='Date', length=693, freq='M')

df.index

DatetimeIndex(['1962-08-31', '1962-09-28', '1962-10-31', '1962-11-30',
               '1962-12-31', '1963-01-31', '1963-02-28', '1963-03-29',
               '1963-04-30', '1963-05-31',
               ...
               '2020-03-31', '2020-04-30', '2020-05-29', '2020-06-30',
               '2020-07-31', '2020-08-31', '2020-09-30', '2020-10-30',
               '2020-11-30', '2020-12-31'],
              dtype='datetime64[ns]', name='date', length=644, freq=None)

We can deal with this by converting either the df dataframe to period or the dff6 dataframe to datetime

df[17778.0].to_period("M").tail()

date
2020-08    0.115550
2020-09   -0.023077
2020-10   -0.054690
2020-11    0.136159
2020-12    0.012008
Freq: M, Name: 17778.0, dtype: float64

dff6.to_timestamp('M').tail()

	Mkt-RF	SMB	HML	RMW	CMA	RF	Mom
Date
2020-11-30	12.47	6.75	2.11	-2.78	1.05	0.01	-12.25
2020-12-31	4.63	4.67	-1.36	-2.15	0.00	0.01	-2.42
2021-01-31	-0.03	6.88	2.85	-3.33	4.68	0.00	4.37
2021-02-28	2.78	4.51	7.08	0.09	-1.97	0.00	-7.68
2021-03-31	3.09	-0.98	7.40	6.43	3.44	0.00	-5.83

Here I will use the first method

df=(dff6/100).merge(df[17778.0].to_period("M"),left_index=True,right_index=True,how='inner').dropna(how='any')
df.rename(columns={17778:'BRK'},inplace=True)
df

	Mkt-RF	SMB	HML	RMW	CMA	RF	Mom	BRK
1988-11	-0.0229	-0.0166	0.0130	-0.0028	0.0160	0.0057	0.0042	-0.005291
1988-12	0.0149	0.0201	-0.0154	0.0063	-0.0037	0.0063	0.0042	0.000000
1989-01	0.0610	-0.0223	0.0049	-0.0103	0.0016	0.0055	-0.0014	0.047872
1989-02	-0.0225	0.0267	0.0090	-0.0081	0.0190	0.0061	0.0094	-0.040609
1989-03	0.0157	0.0077	0.0047	0.0000	0.0079	0.0067	0.0356	0.047619
...	...	...	...	...	...	...	...	...
2020-08	0.0763	-0.0094	-0.0294	0.0427	-0.0144	0.0001	0.0051	0.115550
2020-09	-0.0363	0.0007	-0.0251	-0.0115	-0.0177	0.0001	0.0305	-0.023077
2020-10	-0.0210	0.0476	0.0403	-0.0060	-0.0053	0.0001	-0.0303	-0.054690
2020-11	0.1247	0.0675	0.0211	-0.0278	0.0105	0.0001	-0.1225	0.136159
2020-12	0.0463	0.0467	-0.0136	-0.0215	0.0000	0.0001	-0.0242	0.012008

386 rows × 8 columns

df.BRK=df.BRK-df.RF

Start with the CAPM

import statsmodels.api as sm
temp=df.copy() 
x= sm.add_constant(temp['Mkt-RF'])
y= temp['BRK']
results= sm.OLS(y,x).fit()
results.summary()

OLS Regression Results
Dep. Variable:	BRK	R-squared:	0.212
Model:	OLS	Adj. R-squared:	0.210
Method:	Least Squares	F-statistic:	103.6
Date:	Tue, 04 May 2021	Prob (F-statistic):	1.06e-21
Time:	09:25:48	Log-Likelihood:	592.00
No. Observations:	386	AIC:	-1180.
Df Residuals:	384	BIC:	-1172.
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	0.0060	0.003	2.210	0.028	0.001	0.011
Mkt-RF	0.6236	0.061	10.178	0.000	0.503	0.744

Omnibus:	83.751	Durbin-Watson:	2.060
Prob(Omnibus):	0.000	Jarque-Bera (JB):	261.498
Skew:	0.974	Prob(JB):	1.65e-57
Kurtosis:	6.530	Cond. No.	23.0

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

results.params[0]*12

0.07164885662375453

Market beta of about 0.6 and alpha of 7% per year!

Six factor model

temp=df.copy() 
x= sm.add_constant(temp.drop(['BRK','RF'],axis=1))
y= temp['BRK']
results= sm.OLS(y,x).fit()
results.summary()

OLS Regression Results
Dep. Variable:	BRK	R-squared:	0.348
Model:	OLS	Adj. R-squared:	0.338
Method:	Least Squares	F-statistic:	33.74
Date:	Tue, 04 May 2021	Prob (F-statistic):	1.35e-32
Time:	09:26:02	Log-Likelihood:	628.50
No. Observations:	386	AIC:	-1243.
Df Residuals:	379	BIC:	-1215.
Df Model:	6
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	0.0046	0.003	1.774	0.077	-0.001	0.010
Mkt-RF	0.7599	0.067	11.357	0.000	0.628	0.892
SMB	-0.3946	0.091	-4.316	0.000	-0.574	-0.215
HML	0.4836	0.118	4.091	0.000	0.251	0.716
RMW	0.2687	0.121	2.214	0.027	0.030	0.507
CMA	-0.1286	0.172	-0.749	0.454	-0.466	0.209
Mom	-0.0077	0.057	-0.135	0.892	-0.119	0.104

Omnibus:	89.527	Durbin-Watson:	2.143
Prob(Omnibus):	0.000	Jarque-Bera (JB):	237.376
Skew:	1.105	Prob(JB):	2.85e-52
Kurtosis:	6.143	Cond. No.	80.7

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

print(results.params[0]*12)

0.055415983696528254

What do we learn about Warren Investment style?
What do we learn about Warren specific stock picking skills?

Aplication 2: Does Cathie Wood has some secret sauce? Performance evalaution of ARKK

df=get_returns(['ARKK'],conn,'1/1/1960','1/1/2022',variable='ret')
(df+1).cumprod().plot()

<AxesSubplot:xlabel='date'>

../../_images/MultiFactorModels_27_1.png

df=(dff6/100).merge(df.to_period("M"),left_index=True,right_index=True,how='inner').dropna(how='any')
df.rename(columns={14948.0:'ARKK'},inplace=True)
df.ARKK=df.ARKK-df.RF

temp=df.copy() 
x= sm.add_constant(temp['Mkt-RF'])
y= temp['ARKK']
results= sm.OLS(y,x).fit()
results.summary()

OLS Regression Results
Dep. Variable:	ARKK	R-squared:	0.691
Model:	OLS	Adj. R-squared:	0.687
Method:	Least Squares	F-statistic:	160.9
Date:	Tue, 04 May 2021	Prob (F-statistic):	4.98e-20
Time:	09:28:12	Log-Likelihood:	120.93
No. Observations:	74	AIC:	-237.9
Df Residuals:	72	BIC:	-233.2
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	0.0115	0.006	2.016	0.048	0.000	0.023
Mkt-RF	1.5768	0.124	12.684	0.000	1.329	1.825

Omnibus:	4.334	Durbin-Watson:	1.974
Prob(Omnibus):	0.115	Jarque-Bera (JB):	3.484
Skew:	0.465	Prob(JB):	0.175
Kurtosis:	3.516	Cond. No.	22.3

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

print(results.params[0]*12)

0.1385421007873513

temp=df.copy() 
x= sm.add_constant(temp.drop(['ARKK','RF'],axis=1))
y= temp['ARKK']
results= sm.OLS(y,x).fit()
results.summary()

OLS Regression Results
Dep. Variable:	ARKK	R-squared:	0.818
Model:	OLS	Adj. R-squared:	0.802
Method:	Least Squares	F-statistic:	50.21
Date:	Tue, 04 May 2021	Prob (F-statistic):	6.69e-23
Time:	09:28:44	Log-Likelihood:	140.55
No. Observations:	74	AIC:	-267.1
Df Residuals:	67	BIC:	-251.0
Df Model:	6
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	0.0059	0.005	1.234	0.221	-0.004	0.015
Mkt-RF	1.3860	0.126	11.009	0.000	1.135	1.637
SMB	0.5065	0.210	2.410	0.019	0.087	0.926
HML	-0.8120	0.200	-4.067	0.000	-1.211	-0.413
RMW	-0.1642	0.326	-0.504	0.616	-0.815	0.487
CMA	-0.9219	0.351	-2.624	0.011	-1.623	-0.221
Mom	-0.2654	0.152	-1.747	0.085	-0.569	0.038

Omnibus:	10.376	Durbin-Watson:	2.119
Prob(Omnibus):	0.006	Jarque-Bera (JB):	10.331
Skew:	0.806	Prob(JB):	0.00571
Kurtosis:	3.866	Cond. No.	83.6

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

print(results.params[0]*12)

0.0704257048781867

What do we learn about Cathie Wood investment style?
What do we learn about Cathie Wook investment skills?
How do you think about the R-squared? Is a large Rsquared good or bad in this case?

Application 3: what is a good Momentum ETF?

names=['VFMO','QMOM','FDMO','MTUM','IMTM','IMOM']
dmom=get_returns([names[0]],conn,'1/1/1960','1/1/2022')
for n in names[1:]:
    df=get_returns([n],conn,'1/1/1960','1/1/2022')
    dmom=dmom.merge(df,left_index=True,right_index=True,how='outer')
dmom.columns=names    
(dmom+1).cumprod().plot()

<AxesSubplot:xlabel='date'>

../../_images/MultiFactorModels_35_1.png

# lets merge
df=(dff6/100).merge(dmom.to_period("M"),left_index=True,right_index=True,how='inner')
# lets conver to excess returns
df[names]=df[names].subtract(df['RF'],axis=0)

temp.columns

Index(['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'RF', 'Mom   ', 'VFMO', 'QMOM',
       'FDMO', 'MTUM', 'IMTM', 'IMOM'],
      dtype='object')

temp=df.copy() 
x= sm.add_constant(temp[['Mkt-RF','Mom   ']])
results=pd.DataFrame([],index=[],columns=names)
for n in names:
    y= temp[n]
    res= sm.OLS(y,x,missing='drop').fit()
    results.at['alpha',n]=res.params[0]
    results.at['t(alpha)',n]=res.tvalues[0]
    results.at['betamkt',n]=res.params[1]
    results.at['t(betamkt)',n]=res.tvalues[1]
    results.at['betamom',n]=res.params[2]
    results.at['t(betamom)',n]=res.tvalues[2]
    results.at['R-squared',n]=res.rsquared
    results.at['indio risk',n]=res.resid.std()*12**0.5
    results.at['sample size',n]=res.nobs
results

	VFMO	QMOM	FDMO	MTUM	IMTM	IMOM
alpha	-0.001458	-0.002718	-0.002354	0.001112	-0.002008	-0.005894
t(alpha)	-0.430959	-0.63581	-1.664363	0.869311	-0.847557	-1.459713
betamkt	1.132283	1.39176	1.024135	1.00483	0.761741	0.958616
t(betamkt)	16.762846	13.753471	31.996319	31.597999	13.558808	10.029335
betamom	0.186237	0.394746	0.254752	0.345416	0.09699	0.15187
t(betamom)	2.178768	3.281823	6.461063	9.626601	1.568105	1.336749
R-squared	0.917845	0.782982	0.960205	0.919332	0.756383	0.674357
indio risk	0.064076	0.107838	0.032342	0.039782	0.065174	0.101857
sample size	34.0	60.0	51.0	92.0	71.0	60.0

what do you conclude?
Which ETF would you pick?
How to think about the alpha here?
How do you think about the R-squared? Is a large Rsquared good or bad in this case?
How do you account for the differnece in market betas when investing?

Quantitative Investing

Multi-factor models

14.4. Multi-factor models#