import pandas as pd
import statsmodels.api as sm
import statsmodels.stats.api as sms
3 Rational agent and behavioural finance
This chapter and the next two chapters aim to build a portfolio using the Rational agent and behavioural finance theory and the markets anomalies theory. The idea is to start with several assets (stocks) and filter for those assets based on the following.
Step 1. Filter Market efficiency. We will test each stock to verify if we can use its historical information to forecast the Price.
Step 2. Filter Market anomalies. The strategy consists of buying assets when their prices are trending up and selling them when they are down. We will pick stocks that contribute to increasing the portfolio return.
Step 3. Portfolio allocation. Finally, we will optimize.
It is important to note that this is not the traditional Markowitz portfolio building, where we only optimize the weights of the portfolio’s assets. First, we do a stock pick, which we call filters in this book, that fulfill some assumptions. Those assumptions are related to rational agent theory and market anomalies theory. After that stock picking, we optimize the weights of the assets.
3.1 Introduction to rational agent and behavioural finance
Around 1970, economists argued that an efficient market should instantaneously reflect all the available information of a particular financial security, called the Efficient Market Hypothesis EMH (Fama 1970). Then, arbitrage opportunities were challenging to exist, or they used to argue that those markets were not predictable. Academics were reasonably content with the EMH until in1987 stock market behavior in 1987 was bizarre. The year began with the Dow Jones Industrial Average’s historic collapse. What is interesting about 1987 is that trading folklore and the activities of leading academic economists fit the behavioral finance point of view, not the EMH point of view.
Economists actively discussing and acting in financial markets seemed to believe that markets were predictable, a fundamental principle of modern behavioral finance (Burton and Shah 2013). By designing systematic trading platforms, some traders and trading systems aim to generate signals that consistently produce positive outcomes over many trades. Usually, the trades test successful trading systems on large amounts of past historical data. A more scientific method for analyzing a particular financial security may be determining whether the security price changes are random. If the price changes are random, the probability of detecting a consistently profitable trading opportunity for that particular security is negligible. On the other hand, if the price changes are non-random, the financial security has persistent predictability and should be analyzed further. Then, it is possible to measure the relative availability of trading opportunities with the market inefficiency tests (Aldridge 2010). In summary, if the tests detect that new information takes in slowly in the asset prices, arbitrage opportunities exist, and the market is inefficient.
In this chapter, we apply the EMH test proposed by (Wooldridge 2020) to identify arbitrage opportunities or to find inefficient markets.
3.2 EMH test on historical returns for one asset
Suppose \(y_{t}\) is the daily price of the S&P500. A strict form of the Efficient Markets Hypothesis EMH establishes that the historical information on the index before day t should not help predict the index. If we use only past information on \(y_{t}\), a market is efficient if the following is true:
\[y_t= \beta_0 +\beta_1\ y_{t-1} + \beta_2\ y_{t-2}+u_t\ (1)\]
Where the term on the right is the expected value of \(y_{t}\), given the historical information of the index \(y_{t-1} ,y_{t-2},....\). In other words, the expected value does depend on its own historical information. However, if the previous equation is false, it implies that we could not use the information to predict the current price. One advantage of this test is that is easy to understand, assuming that you have at least basic econometric knowledge.
Suppose that we want to make the EMH test on the returns of the AAPL.
For this chapter we use daily data for several stocks from Jan. 2020 to May. 2020.
=pd.read_csv("https://raw.githubusercontent.com/abernal30/AFP_py/refs/heads/main/data/1Rational_agent.csv",index_col=0)
data#.head() data
AAPL.Close | MSFT.Close | GOOG.Close | GOOGL.Close | AMZN.Close | TSLA.Close | BRK.A.Close | BRK.B.Close | FB.Close | TSM.Close | ... | TMUS.Close | PM.Close | AMD.Close | LIN.Close | TXN.Close | CRM.Close | BMY.Close | UPS.Close | RLLCF.Close | QCOM.Close | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||||||||||
01/02/2020 | 75.087502 | 160.619995 | 1367.369995 | 1368.680054 | 1898.010010 | 86.052002 | 342261 | 228.389999 | 209.779999 | 60.040001 | ... | 78.589996 | 85.190002 | 49.099998 | 210.740005 | 129.570007 | 166.990005 | 63.340000 | 116.790001 | 0.0046 | 88.690002 |
01/03/2020 | 74.357498 | 158.619995 | 1360.660034 | 1361.520020 | 1874.969971 | 88.601997 | 339155 | 226.179993 | 208.669998 | 58.060001 | ... | 78.169998 | 85.029999 | 48.599998 | 205.259995 | 127.849998 | 166.169998 | 62.779999 | 116.720001 | 0.0100 | 87.019997 |
01/06/2020 | 74.949997 | 159.029999 | 1394.209961 | 1397.810059 | 1902.880005 | 90.307999 | 340210 | 226.990005 | 212.600006 | 57.389999 | ... | 78.620003 | 86.019997 | 48.389999 | 204.389999 | 126.959999 | 173.449997 | 62.980000 | 116.199997 | 0.0217 | 86.510002 |
01/07/2020 | 74.597504 | 157.580002 | 1393.339966 | 1395.109985 | 1906.859985 | 93.811996 | 338901 | 225.919998 | 213.059998 | 58.320000 | ... | 78.919998 | 86.400002 | 48.250000 | 204.830002 | 129.410004 | 176.000000 | 63.930000 | 116.000000 | 0.0126 | 88.970001 |
01/08/2020 | 75.797501 | 160.089996 | 1404.319946 | 1405.040039 | 1891.969971 | 98.428001 | 339188 | 225.990005 | 215.220001 | 58.750000 | ... | 79.419998 | 88.040001 | 47.830002 | 207.389999 | 129.759995 | 177.330002 | 63.860001 | 116.660004 | 0.0099 | 88.709999 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
05/20/2022 | 137.589996 | 252.559998 | 2186.260010 | 2178.159912 | 2151.820068 | 663.900024 | 456500 | 304.049988 | 193.539993 | 90.779999 | ... | 126.040001 | 101.150002 | 93.500000 | 315.179993 | 169.809998 | 159.649994 | 76.190002 | 171.039993 | 0.0072 | 131.600006 |
05/23/2022 | 143.110001 | 260.649994 | 2233.330078 | 2229.760010 | 2151.139893 | 674.900024 | 464510 | 310.200012 | 196.229996 | 91.500000 | ... | 129.889999 | 102.910004 | 95.070000 | 320.420013 | 169.929993 | 160.320007 | 76.699997 | 174.389999 | 0.0085 | 132.119995 |
05/24/2022 | 140.360001 | 259.619995 | 2118.520020 | 2119.399902 | 2082.000000 | 628.159973 | 463606 | 309.170013 | 181.279999 | 88.720001 | ... | 129.220001 | 106.620003 | 91.160004 | 320.489990 | 167.860001 | 156.929993 | 77.129997 | 174.110001 | 0.0075 | 128.529999 |
05/25/2022 | 140.520004 | 262.519989 | 2116.790039 | 2116.100098 | 2135.500000 | 658.799988 | 462890 | 308.640015 | 183.830002 | 90.410004 | ... | 131.440002 | 108.570000 | 92.650002 | 315.850006 | 170.009995 | 159.649994 | 77.239998 | 173.860001 | 0.0080 | 131.229996 |
05/26/2022 | 143.779999 | 265.899994 | 2165.919922 | 2155.850098 | 2221.550049 | 707.729980 | 468805 | 312.500000 | 191.630005 | 91.000000 | ... | 132.740005 | 108.070000 | 98.750000 | 320.329987 | 174.130005 | 162.460007 | 77.589996 | 178.380005 | 0.0085 | 134.839996 |
606 rows × 100 columns
="AAPL.Close"
stock=data[stock] # subset of one the stock
aapl aapl.head()
date
01/02/2020 75.087502
01/03/2020 74.357498
01/06/2020 74.949997
01/07/2020 74.597504
01/08/2020 75.797501
Name: AAPL.Close, dtype: float64
This is the Efficient market hypothesis EMH for one stock
\[\begin{align} ret =\beta_{0}+\beta_{1}ret_{-1}+\beta_{1}ret_{-2}+u\ (2) \end{align}\]
where ret is the return of the stock, \(ret_{-1}\) is the return of the lagged one period, and \(ret_{-2}\) is the return of the lagged two periods
To run the previous model, we need to estimate the returns, create the variables and store them in a data frame. We estimate arithmetic return: (P/P(t-e))-1
=aapl.pct_change()
ret ret.head()
date
01/02/2020 NaN
01/03/2020 -0.009722
01/06/2020 0.007968
01/07/2020 -0.004703
01/08/2020 0.016086
Name: AAPL.Close, dtype: float64
Here we create the lags of the variables, and concatenate them on a data frame.
=ret.shift(-1)
lag1=lag1.rename("lag_1") # This is the first lag r(t-1)
lag1=ret.shift(-2)
lag2=lag2.rename("lag_2") # This is the second lag r(t-1)o r(t-2) lag2
Then, we concatenate the three variables.
all=pd.concat([ret,lag1,lag2],axis=1)
all.head()
AAPL.Close | lag_1 | lag_2 | |
---|---|---|---|
date | |||
01/02/2020 | NaN | -0.009722 | 0.007968 |
01/03/2020 | -0.009722 | 0.007968 | -0.004703 |
01/06/2020 | 0.007968 | -0.004703 | 0.016086 |
01/07/2020 | -0.004703 | 0.016086 | 0.021241 |
01/08/2020 | 0.016086 | 0.021241 | 0.002261 |
Now we run the OLS model, but first we eliminate the missing values.
all.dropna(inplace=True) # Dropping missing values
=sm.add_constant(all) # Adding the constant to the model
todas2=todas2[stock] # defining the ariable y
y=todas2.loc[:,("const","lag_1","lag_2")] # Defining X, by select the independent variables
X=sm.OLS(y,X) # Runing the OLS model
model=model.fit() # Estimating the parameters of the OLS model
resprint(res.summary()) # printing the summary of the output
OLS Regression Results
==============================================================================
Dep. Variable: AAPL.Close R-squared: 0.034
Model: OLS Adj. R-squared: 0.031
Method: Least Squares F-statistic: 10.65
Date: Thu, 21 Nov 2024 Prob (F-statistic): 2.86e-05
Time: 13:32:39 Log-Likelihood: 1419.3
No. Observations: 603 AIC: -2833.
Df Residuals: 600 BIC: -2819.
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0015 0.001 1.615 0.107 -0.000 0.003
lag_1 -0.1800 0.041 -4.410 0.000 -0.260 -0.100
lag_2 0.0214 0.041 0.525 0.600 -0.059 0.102
==============================================================================
Omnibus: 54.729 Durbin-Watson: 1.997
Prob(Omnibus): 0.000 Jarque-Bera (JB): 304.988
Skew: 0.008 Prob(JB): 5.93e-67
Kurtosis: 6.484 Cond. No. 47.3
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Remember, a significant beta1 coefficient would reject EMH. In this case, the coefficient of lag 1 is significant at 1%, and the coefficient of lag 2 is insignificant. For this example, we can use the previous day’s return on Apple to predict tomorrow’s return. However, we can not use the previous day and the two days before returns to predict tomorrow’s return. If, in this case, the two coefficients were significant, the market would not be efficient for Apple because we can use historical information to predict future returns. If both coefficients were insignificant, we could say that the market is efficient for Apple, and then that historical information to predict the future return would not be helpful. We mentioned that one advantage of this test is that it is easy to understand. However, like in the Apple case, if one coefficient is significant and the other is not, we have an inconclusive EMH test, so we need another EMH test that corrects that issue, which is what the next EMH test for variance does.
3.3 EMH test for variance for one asset
For some financial time series, such as stock returns, the expected returns do may not depend on past returns (the Market is efficient), but the variance of returns may. For example, in the model:
\[r_t= \beta_0 +\beta_1\ r_{t-1}+\beta_2\ r_{t-2}+u_t\ (3)\]
We could apply a test to verify if the variance of returns have an effect o the returns:
\[u^2_{t}= \delta_0 +\delta_1\ r_{t-1}+\delta_2\ r_{t-2}+e_t\ (3.1)\]
The previous model is for heteroskedasticity test, then we could apply Breusch-Pagan (BP) test for heteroskedasticity. The intuition behind applying the heteroskedasticity to test EMH, is because, remember that the null hypothesis in a test such as BP in equation 3.1 is:
\[\delta_0=\delta_1=\delta_2=0\]
Then, for a small p-value of the BP test, we reject the null hypothesis and the model would be heteroscedastic. If that happens, in terms of the EMH it implies that there is evidence that the variance of the historic returns have an effect on the today return, and then we can use the historical information of the return to predict future returns.
Here we aply the BP test.
= sms.het_breuschpagan(res.resid, res.model.exog)
test test
(0.4445405267105208,
0.8006989355087222,
0.22132760713799005,
0.8015194789663879)
We have four statistics: the Lagrange value and its t p-value and the F-value and its p.value. Both are usually consistent in their results. For this chapter, we will use the F-value. This is the F-value
2] test[
0.22132760713799005
This is the F-pvalue
3] test[
0.8015194789663879
In the previous output, we see, by the F-pvale, that we do not reject the null hypothesis, then the model is homoscedastic. Regarding the EMH, in this case, it would imply that there is evidence that the variance of the historic returns does not affect today’s return, and we can not use the historical information of the return to predict future returns. The implication for our portfolio is that we must exclude Apple from it.
3.4 EMH tests for the variance for many assets to build a portfolio
In this chapter, we filter those stocks so that we can make a prediction. Regarding the EMH test for variance, the stocks for which there is a small p-value on the BP test. For this chapter, small is a p-value less than 0.1.
The next code is to build a “Loop For” to perform the EMH test for variance for many assets. First, we get the ticker’s names.
=list(data.columns) # tickers names
tickers5] tickers[:
['AAPL.Close', 'MSFT.Close', 'GOOG.Close', 'GOOGL.Close', 'AMZN.Close']
=[]
pval=[]
fstat=[]
namesfor stock in tickers:
=data[[stock]]
ap=ap.pct_change()
ret=ret.shift(-1)
lag1=lag1.rename(columns={stock:"lag_1"})
lag1
=ret.shift(-2)
lag2=lag2.rename(columns={stock:"lag_2"})
lag2
=pd.concat([ret[stock],lag1,lag2],axis=1)
ret2=True)
ret2.dropna(inplace=sm.add_constant(ret2)
data3=data3[stock]
y=data3.loc[:,("const","lag_1","lag_2")]
X=sm.OLS(y,X)
model=model.fit()
res= sms.het_breuschpagan(res.resid, res.model.exog)
test 3]) # To store the F-pvalue
pval.append(test[2])# To store the F values
fstat.append(test[ names.append(stock)
We transform the results into a data frame.
=pd.DataFrame({"pval":pval,"fstat":fstat},index=names)
df
df.head()
pval | fstat | |
---|---|---|
AAPL.Close | 0.801519 | 0.221328 |
MSFT.Close | 0.113536 | 2.183545 |
GOOG.Close | 0.555914 | 0.587716 |
GOOGL.Close | 0.771132 | 0.260009 |
AMZN.Close | 0.626187 | 0.468472 |
Here we apply the filter to get the stocks whit a Fp-value less than 0.1.
=df[df["pval"]<=0.1]
df2 df2.head()
pval | fstat | |
---|---|---|
TSLA.Close | 0.001851 | 6.358248 |
TSM.Close | 0.025103 | 3.707484 |
JNJ.Close | 0.000003 | 12.999178 |
UNH.Close | 0.039176 | 3.257248 |
JPM.Close | 0.052720 | 2.957243 |
df2.tail()
pval | fstat | |
---|---|---|
NKE.Close | 7.981519e-03 | 4.869728 |
INTC.Close | 2.204589e-02 | 3.838985 |
C.PJ.Close | 5.401396e-28 | 69.839166 |
TMUS.Close | 6.114786e-03 | 5.140592 |
TXN.Close | 3.416205e-03 | 5.733322 |
For the portfolio’s next stage, we kept the stocks inside the data frame df2. We will also need the prices and returns of those stocks, so we will build a data frame with their prices.
=data.loc[:,df2.index]
data2 data2.head()
TSLA.Close | TSM.Close | JNJ.Close | UNH.Close | JPM.Close | TCEHY.Close | TCTZF.Close | XOM.Close | BAC.Close | PG.Close | ... | ACN.Close | CSCO.Close | LRLCF.Close | CICHF.Close | MCD.Close | NKE.Close | INTC.Close | C.PJ.Close | TMUS.Close | TXN.Close | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||||||||||
01/02/2020 | 86.052002 | 60.040001 | 145.970001 | 292.500000 | 141.089996 | 49.880001 | 49.880001 | 70.900002 | 35.639999 | 123.410004 | ... | 210.149994 | 48.419998 | 293.450012 | 0.87 | 200.789993 | 102.199997 | 60.840000 | 28.570000 | 78.589996 | 129.570007 |
01/03/2020 | 88.601997 | 58.060001 | 144.279999 | 289.540009 | 138.339996 | 49.029999 | 48.930000 | 70.330002 | 34.900002 | 122.580002 | ... | 209.800003 | 47.630001 | 297.130005 | 0.84 | 200.080002 | 101.919998 | 60.099998 | 28.719999 | 78.169998 | 127.849998 |
01/06/2020 | 90.307999 | 57.389999 | 144.100006 | 291.549988 | 138.229996 | 48.770000 | 48.700001 | 70.870003 | 34.849998 | 122.750000 | ... | 208.429993 | 47.799999 | 293.000000 | 0.84 | 202.330002 | 101.830002 | 59.930000 | 28.719999 | 78.620003 | 126.959999 |
01/07/2020 | 93.811996 | 58.320000 | 144.979996 | 289.790009 | 135.880005 | 49.779999 | 49.770000 | 70.290001 | 34.619999 | 121.989998 | ... | 203.929993 | 47.490002 | 288.549988 | 0.88 | 202.630005 | 101.779999 | 58.930000 | 28.629999 | 78.919998 | 129.410004 |
01/08/2020 | 98.428001 | 58.750000 | 144.960007 | 295.899994 | 136.940002 | 49.650002 | 49.650002 | 69.230003 | 34.970001 | 122.510002 | ... | 204.330002 | 47.520000 | 287.500000 | 0.88 | 205.910004 | 101.550003 | 58.970001 | 28.709999 | 79.419998 | 129.759995 |
5 rows × 56 columns