3  Rational agent and behavioural finance

This chapter and the next two chapters aim to build a portfolio using the Rational agent and behavioural finance theory and the markets anomalies theory. The idea is to start with several assets (stocks) and filter for those assets based on the following.

It is important to note that this is not the traditional Markowitz portfolio building, where we only optimize the weights of the portfolio’s assets. First, we do a stock pick, which we call filters in this book, that fulfill some assumptions. Those assumptions are related to rational agent theory and market anomalies theory. After that stock picking, we optimize the weights of the assets.

3.1 Introduction to rational agent and behavioural finance

Around 1970, economists argued that an efficient market should instantaneously reflect all the available information of a particular financial security, called the Efficient Market Hypothesis EMH (Fama 1970). Then, arbitrage opportunities were challenging to exist, or they used to argue that those markets were not predictable. Academics were reasonably content with the EMH until in1987 stock market behavior in 1987 was bizarre. The year began with the Dow Jones Industrial Average’s historic collapse. What is interesting about 1987 is that trading folklore and the activities of leading academic economists fit the behavioral finance point of view, not the EMH point of view.

Economists actively discussing and acting in financial markets seemed to believe that markets were predictable, a fundamental principle of modern behavioral finance (Burton and Shah 2013). By designing systematic trading platforms, some traders and trading systems aim to generate signals that consistently produce positive outcomes over many trades. Usually, the trades test successful trading systems on large amounts of past historical data. A more scientific method for analyzing a particular financial security may be determining whether the security price changes are random. If the price changes are random, the probability of detecting a consistently profitable trading opportunity for that particular security is negligible. On the other hand, if the price changes are non-random, the financial security has persistent predictability and should be analyzed further. Then, it is possible to measure the relative availability of trading opportunities with the market inefficiency tests (Aldridge 2010). In summary, if the tests detect that new information takes in slowly in the asset prices, arbitrage opportunities exist, and the market is inefficient.

In this chapter, we apply the EMH test proposed by (Wooldridge 2020) to identify arbitrage opportunities or to find inefficient markets.

3.2 EMH test on historical returns for one asset

Suppose \(y_{t}\) is the daily price of the S&P500. A strict form of the Efficient Markets Hypothesis EMH establishes that the historical information on the index before day t should not help predict the index. If we use only past information on \(y_{t}\), a market is efficient if the following is true:

\[y_t= \beta_0 +\beta_1\ y_{t-1} + \beta_2\ y_{t-2}+u_t\ (1)\]

Where the term on the right is the expected value of \(y_{t}\), given the historical information of the index \(y_{t-1} ,y_{t-2},....\). In other words, the expected value does depend on its own historical information. However, if the previous equation is false, it implies that we could not use the information to predict the current price. One advantage of this test is that is easy to understand, assuming that you have at least basic econometric knowledge.

Suppose that we want to make the EMH test on the returns of the AAPL.

import pandas as pd
import statsmodels.api as sm
import statsmodels.stats.api as sms

For this chapter we use daily data for several stocks from Jan. 2020 to May. 2020.

data=pd.read_csv("https://raw.githubusercontent.com/abernal30/AFP_py/refs/heads/main/data/1Rational_agent.csv",index_col=0)
data#.head()
AAPL.Close MSFT.Close GOOG.Close GOOGL.Close AMZN.Close TSLA.Close BRK.A.Close BRK.B.Close FB.Close TSM.Close ... TMUS.Close PM.Close AMD.Close LIN.Close TXN.Close CRM.Close BMY.Close UPS.Close RLLCF.Close QCOM.Close
date
01/02/2020 75.087502 160.619995 1367.369995 1368.680054 1898.010010 86.052002 342261 228.389999 209.779999 60.040001 ... 78.589996 85.190002 49.099998 210.740005 129.570007 166.990005 63.340000 116.790001 0.0046 88.690002
01/03/2020 74.357498 158.619995 1360.660034 1361.520020 1874.969971 88.601997 339155 226.179993 208.669998 58.060001 ... 78.169998 85.029999 48.599998 205.259995 127.849998 166.169998 62.779999 116.720001 0.0100 87.019997
01/06/2020 74.949997 159.029999 1394.209961 1397.810059 1902.880005 90.307999 340210 226.990005 212.600006 57.389999 ... 78.620003 86.019997 48.389999 204.389999 126.959999 173.449997 62.980000 116.199997 0.0217 86.510002
01/07/2020 74.597504 157.580002 1393.339966 1395.109985 1906.859985 93.811996 338901 225.919998 213.059998 58.320000 ... 78.919998 86.400002 48.250000 204.830002 129.410004 176.000000 63.930000 116.000000 0.0126 88.970001
01/08/2020 75.797501 160.089996 1404.319946 1405.040039 1891.969971 98.428001 339188 225.990005 215.220001 58.750000 ... 79.419998 88.040001 47.830002 207.389999 129.759995 177.330002 63.860001 116.660004 0.0099 88.709999
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
05/20/2022 137.589996 252.559998 2186.260010 2178.159912 2151.820068 663.900024 456500 304.049988 193.539993 90.779999 ... 126.040001 101.150002 93.500000 315.179993 169.809998 159.649994 76.190002 171.039993 0.0072 131.600006
05/23/2022 143.110001 260.649994 2233.330078 2229.760010 2151.139893 674.900024 464510 310.200012 196.229996 91.500000 ... 129.889999 102.910004 95.070000 320.420013 169.929993 160.320007 76.699997 174.389999 0.0085 132.119995
05/24/2022 140.360001 259.619995 2118.520020 2119.399902 2082.000000 628.159973 463606 309.170013 181.279999 88.720001 ... 129.220001 106.620003 91.160004 320.489990 167.860001 156.929993 77.129997 174.110001 0.0075 128.529999
05/25/2022 140.520004 262.519989 2116.790039 2116.100098 2135.500000 658.799988 462890 308.640015 183.830002 90.410004 ... 131.440002 108.570000 92.650002 315.850006 170.009995 159.649994 77.239998 173.860001 0.0080 131.229996
05/26/2022 143.779999 265.899994 2165.919922 2155.850098 2221.550049 707.729980 468805 312.500000 191.630005 91.000000 ... 132.740005 108.070000 98.750000 320.329987 174.130005 162.460007 77.589996 178.380005 0.0085 134.839996

606 rows × 100 columns

stock="AAPL.Close"
aapl=data[stock] # subset of one the stock
aapl.head()
date
01/02/2020    75.087502
01/03/2020    74.357498
01/06/2020    74.949997
01/07/2020    74.597504
01/08/2020    75.797501
Name: AAPL.Close, dtype: float64

This is the Efficient market hypothesis EMH for one stock

\[\begin{align} ret =\beta_{0}+\beta_{1}ret_{-1}+\beta_{1}ret_{-2}+u\ (2) \end{align}\]

where ret is the return of the stock, \(ret_{-1}\) is the return of the lagged one period, and \(ret_{-2}\) is the return of the lagged two periods

To run the previous model, we need to estimate the returns, create the variables and store them in a data frame. We estimate arithmetic return: (P/P(t-e))-1

ret=aapl.pct_change()
ret.head()
date
01/02/2020         NaN
01/03/2020   -0.009722
01/06/2020    0.007968
01/07/2020   -0.004703
01/08/2020    0.016086
Name: AAPL.Close, dtype: float64

Here we create the lags of the variables, and concatenate them on a data frame.

lag1=ret.shift(-1)
lag1=lag1.rename("lag_1") # This is the first lag  r(t-1)
lag2=ret.shift(-2)
lag2=lag2.rename("lag_2") # This is the second lag  r(t-1)o r(t-2)

Then, we concatenate the three variables.

all=pd.concat([ret,lag1,lag2],axis=1) 
all.head()
AAPL.Close lag_1 lag_2
date
01/02/2020 NaN -0.009722 0.007968
01/03/2020 -0.009722 0.007968 -0.004703
01/06/2020 0.007968 -0.004703 0.016086
01/07/2020 -0.004703 0.016086 0.021241
01/08/2020 0.016086 0.021241 0.002261

Now we run the OLS model, but first we eliminate the missing values.

all.dropna(inplace=True) # Dropping  missing values
todas2=sm.add_constant(all) # Adding the constant to the model
y=todas2[stock] # defining the ariable y
X=todas2.loc[:,("const","lag_1","lag_2")] # Defining X, by select the independent variables
model=sm.OLS(y,X) # Runing the OLS model
res=model.fit() # Estimating the parameters of the OLS model
print(res.summary()) # printing the summary of the output
                            OLS Regression Results                            
==============================================================================
Dep. Variable:             AAPL.Close   R-squared:                       0.034
Model:                            OLS   Adj. R-squared:                  0.031
Method:                 Least Squares   F-statistic:                     10.65
Date:                Thu, 21 Nov 2024   Prob (F-statistic):           2.86e-05
Time:                        13:32:39   Log-Likelihood:                 1419.3
No. Observations:                 603   AIC:                            -2833.
Df Residuals:                     600   BIC:                            -2819.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0015      0.001      1.615      0.107      -0.000       0.003
lag_1         -0.1800      0.041     -4.410      0.000      -0.260      -0.100
lag_2          0.0214      0.041      0.525      0.600      -0.059       0.102
==============================================================================
Omnibus:                       54.729   Durbin-Watson:                   1.997
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              304.988
Skew:                           0.008   Prob(JB):                     5.93e-67
Kurtosis:                       6.484   Cond. No.                         47.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Remember, a significant beta1 coefficient would reject EMH. In this case, the coefficient of lag 1 is significant at 1%, and the coefficient of lag 2 is insignificant. For this example, we can use the previous day’s return on Apple to predict tomorrow’s return. However, we can not use the previous day and the two days before returns to predict tomorrow’s return. If, in this case, the two coefficients were significant, the market would not be efficient for Apple because we can use historical information to predict future returns. If both coefficients were insignificant, we could say that the market is efficient for Apple, and then that historical information to predict the future return would not be helpful. We mentioned that one advantage of this test is that it is easy to understand. However, like in the Apple case, if one coefficient is significant and the other is not, we have an inconclusive EMH test, so we need another EMH test that corrects that issue, which is what the next EMH test for variance does.

3.3 EMH test for variance for one asset

For some financial time series, such as stock returns, the expected returns do may not depend on past returns (the Market is efficient), but the variance of returns may. For example, in the model:

\[r_t= \beta_0 +\beta_1\ r_{t-1}+\beta_2\ r_{t-2}+u_t\ (3)\]

We could apply a test to verify if the variance of returns have an effect o the returns:

\[u^2_{t}= \delta_0 +\delta_1\ r_{t-1}+\delta_2\ r_{t-2}+e_t\ (3.1)\]

The previous model is for heteroskedasticity test, then we could apply Breusch-Pagan (BP) test for heteroskedasticity. The intuition behind applying the heteroskedasticity to test EMH, is because, remember that the null hypothesis in a test such as BP in equation 3.1 is:

\[\delta_0=\delta_1=\delta_2=0\]

Then, for a small p-value of the BP test, we reject the null hypothesis and the model would be heteroscedastic. If that happens, in terms of the EMH it implies that there is evidence that the variance of the historic returns have an effect on the today return, and then we can use the historical information of the return to predict future returns.
Here we aply the BP test.

test = sms.het_breuschpagan(res.resid, res.model.exog)
test
(0.4445405267105208,
 0.8006989355087222,
 0.22132760713799005,
 0.8015194789663879)

We have four statistics: the Lagrange value and its t p-value and the F-value and its p.value. Both are usually consistent in their results. For this chapter, we will use the F-value. This is the F-value

test[2] 
0.22132760713799005

This is the F-pvalue

test[3] 
0.8015194789663879

In the previous output, we see, by the F-pvale, that we do not reject the null hypothesis, then the model is homoscedastic. Regarding the EMH, in this case, it would imply that there is evidence that the variance of the historic returns does not affect today’s return, and we can not use the historical information of the return to predict future returns. The implication for our portfolio is that we must exclude Apple from it.

3.4 EMH tests for the variance for many assets to build a portfolio

In this chapter, we filter those stocks so that we can make a prediction. Regarding the EMH test for variance, the stocks for which there is a small p-value on the BP test. For this chapter, small is a p-value less than 0.1.

The next code is to build a “Loop For” to perform the EMH test for variance for many assets. First, we get the ticker’s names.

tickers=list(data.columns) # tickers names
tickers[:5] 
['AAPL.Close', 'MSFT.Close', 'GOOG.Close', 'GOOGL.Close', 'AMZN.Close']
pval=[] 
fstat=[]
names=[]
for stock in tickers:

  ap=data[[stock]]
  ret=ap.pct_change()
  lag1=ret.shift(-1)  
  lag1=lag1.rename(columns={stock:"lag_1"})

  lag2=ret.shift(-2)  
  lag2=lag2.rename(columns={stock:"lag_2"})

  ret2=pd.concat([ret[stock],lag1,lag2],axis=1)
  ret2.dropna(inplace=True)
  data3=sm.add_constant(ret2)
  y=data3[stock]
  X=data3.loc[:,("const","lag_1","lag_2")]
  model=sm.OLS(y,X)
  res=model.fit()
  test = sms.het_breuschpagan(res.resid, res.model.exog)
  pval.append(test[3]) # To store the  F-pvalue
  fstat.append(test[2])# To store the  F values
  names.append(stock)

We transform the results into a data frame.

df=pd.DataFrame({"pval":pval,"fstat":fstat},index=names)

df.head()
pval fstat
AAPL.Close 0.801519 0.221328
MSFT.Close 0.113536 2.183545
GOOG.Close 0.555914 0.587716
GOOGL.Close 0.771132 0.260009
AMZN.Close 0.626187 0.468472

Here we apply the filter to get the stocks whit a Fp-value less than 0.1.

df2=df[df["pval"]<=0.1]
df2.head()
pval fstat
TSLA.Close 0.001851 6.358248
TSM.Close 0.025103 3.707484
JNJ.Close 0.000003 12.999178
UNH.Close 0.039176 3.257248
JPM.Close 0.052720 2.957243
df2.tail()
pval fstat
NKE.Close 7.981519e-03 4.869728
INTC.Close 2.204589e-02 3.838985
C.PJ.Close 5.401396e-28 69.839166
TMUS.Close 6.114786e-03 5.140592
TXN.Close 3.416205e-03 5.733322

For the portfolio’s next stage, we kept the stocks inside the data frame df2. We will also need the prices and returns of those stocks, so we will build a data frame with their prices.

data2=data.loc[:,df2.index]
data2.head()
TSLA.Close TSM.Close JNJ.Close UNH.Close JPM.Close TCEHY.Close TCTZF.Close XOM.Close BAC.Close PG.Close ... ACN.Close CSCO.Close LRLCF.Close CICHF.Close MCD.Close NKE.Close INTC.Close C.PJ.Close TMUS.Close TXN.Close
date
01/02/2020 86.052002 60.040001 145.970001 292.500000 141.089996 49.880001 49.880001 70.900002 35.639999 123.410004 ... 210.149994 48.419998 293.450012 0.87 200.789993 102.199997 60.840000 28.570000 78.589996 129.570007
01/03/2020 88.601997 58.060001 144.279999 289.540009 138.339996 49.029999 48.930000 70.330002 34.900002 122.580002 ... 209.800003 47.630001 297.130005 0.84 200.080002 101.919998 60.099998 28.719999 78.169998 127.849998
01/06/2020 90.307999 57.389999 144.100006 291.549988 138.229996 48.770000 48.700001 70.870003 34.849998 122.750000 ... 208.429993 47.799999 293.000000 0.84 202.330002 101.830002 59.930000 28.719999 78.620003 126.959999
01/07/2020 93.811996 58.320000 144.979996 289.790009 135.880005 49.779999 49.770000 70.290001 34.619999 121.989998 ... 203.929993 47.490002 288.549988 0.88 202.630005 101.779999 58.930000 28.629999 78.919998 129.410004
01/08/2020 98.428001 58.750000 144.960007 295.899994 136.940002 49.650002 49.650002 69.230003 34.970001 122.510002 ... 204.330002 47.520000 287.500000 0.88 205.910004 101.550003 58.970001 28.709999 79.419998 129.759995

5 rows × 56 columns