3 Rational agent and behavioural finance

This chapter and the next two chapters aim to build a portfolio using the Rational agent and behavioural finance theory and the markets anomalies theory. The idea is to start with several assets (stocks) and filter for those assets based on the following.

Step 1. Filter Market efficiency. We will test each stock to verify if we can use its historical information to forecast the Price.
Step 2. Filter Market anomalies. The strategy consists of buying assets when their prices are trending up and selling them when they are down. We will pick stocks that contribute to increasing the portfolio return.
Step 3. Portfolio allocation. Finally, we will optimize.

It is important to note that this is not the traditional Markowitz portfolio building, where we only optimize the weights of the portfolio’s assets. First, we do a stock pick, which we call filters in this book, that fulfill some assumptions. Those assumptions are related to rational agent theory and market anomalies theory. After that stock picking, we optimize the weights of the assets.

3.1 Introduction to rational agent and behavioural finance

Around 1970, economists argued that an efficient market should instantaneously reflect all the available information of a particular financial security, called the Efficient Market Hypothesis EMH (Fama 1970). Then, arbitrage opportunities were challenging to exist, or they used to argue that those markets were not predictable. Academics were reasonably content with the EMH until in1987 stock market behavior in 1987 was bizarre. The year began with the Dow Jones Industrial Average’s historic collapse. What is interesting about 1987 is that trading folklore and the activities of leading academic economists fit the behavioral finance point of view, not the EMH point of view.

Economists actively discussing and acting in financial markets seemed to believe that markets were predictable, a fundamental principle of modern behavioral finance (Burton and Shah 2013). By designing systematic trading platforms, some traders and trading systems aim to generate signals that consistently produce positive outcomes over many trades. Usually, the trades test successful trading systems on large amounts of past historical data. A more scientific method for analyzing a particular financial security may be determining whether the security price changes are random. If the price changes are random, the probability of detecting a consistently profitable trading opportunity for that particular security is negligible. On the other hand, if the price changes are non-random, the financial security has persistent predictability and should be analyzed further. Then, it is possible to measure the relative availability of trading opportunities with the market inefficiency tests (Aldridge 2010). In summary, if the tests detect that new information takes in slowly in the asset prices, arbitrage opportunities exist, and the market is inefficient.

In this chapter, we apply the EMH test proposed by (Wooldridge 2020) to identify arbitrage opportunities or to find inefficient markets.

3.2 EMH test on historical returns for one asset

Suppose $y_{t}$ is the daily price of the S&P500. A strict form of the Efficient Markets Hypothesis EMH establishes that the historical information on the index before day t should not help predict the index. If we use only past information on $y_{t}$ , a market is efficient if the following is true:

$y_{t} = β_{0} + β_{1} y_{t - 1} + β_{2} y_{t - 2} + u_{t} (1)$

Where the term on the right is the expected value of $y_{t}$ , given the historical information of the index $y_{t - 1}, y_{t - 2}, . . . .$ . In other words, the expected value does depend on its own historical information. However, if the previous equation is false, it implies that we could not use the information to predict the current price. One advantage of this test is that is easy to understand, assuming that you have at least basic econometric knowledge.

Suppose that we want to make the EMH test on the returns of the AAPL.

import pandas as pd
import statsmodels.api as sm
import statsmodels.stats.api as sms

For this chapter we use daily data for several stocks from Jan. 2020 to May. 2020.

data=pd.read_csv("https://raw.githubusercontent.com/abernal30/AFP_py/refs/heads/main/data/1Rational_agent.csv",index_col=0)
data#.head()

	AAPL.Close	MSFT.Close	GOOG.Close	GOOGL.Close	AMZN.Close	TSLA.Close	BRK.A.Close	BRK.B.Close	FB.Close	TSM.Close	...	TMUS.Close	PM.Close	AMD.Close	LIN.Close	TXN.Close	CRM.Close	BMY.Close	UPS.Close	RLLCF.Close	QCOM.Close
date
01/02/2020	75.087502	160.619995	1367.369995	1368.680054	1898.010010	86.052002	342261	228.389999	209.779999	60.040001	...	78.589996	85.190002	49.099998	210.740005	129.570007	166.990005	63.340000	116.790001	0.0046	88.690002
01/03/2020	74.357498	158.619995	1360.660034	1361.520020	1874.969971	88.601997	339155	226.179993	208.669998	58.060001	...	78.169998	85.029999	48.599998	205.259995	127.849998	166.169998	62.779999	116.720001	0.0100	87.019997
01/06/2020	74.949997	159.029999	1394.209961	1397.810059	1902.880005	90.307999	340210	226.990005	212.600006	57.389999	...	78.620003	86.019997	48.389999	204.389999	126.959999	173.449997	62.980000	116.199997	0.0217	86.510002
01/07/2020	74.597504	157.580002	1393.339966	1395.109985	1906.859985	93.811996	338901	225.919998	213.059998	58.320000	...	78.919998	86.400002	48.250000	204.830002	129.410004	176.000000	63.930000	116.000000	0.0126	88.970001
01/08/2020	75.797501	160.089996	1404.319946	1405.040039	1891.969971	98.428001	339188	225.990005	215.220001	58.750000	...	79.419998	88.040001	47.830002	207.389999	129.759995	177.330002	63.860001	116.660004	0.0099	88.709999
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
05/20/2022	137.589996	252.559998	2186.260010	2178.159912	2151.820068	663.900024	456500	304.049988	193.539993	90.779999	...	126.040001	101.150002	93.500000	315.179993	169.809998	159.649994	76.190002	171.039993	0.0072	131.600006
05/23/2022	143.110001	260.649994	2233.330078	2229.760010	2151.139893	674.900024	464510	310.200012	196.229996	91.500000	...	129.889999	102.910004	95.070000	320.420013	169.929993	160.320007	76.699997	174.389999	0.0085	132.119995
05/24/2022	140.360001	259.619995	2118.520020	2119.399902	2082.000000	628.159973	463606	309.170013	181.279999	88.720001	...	129.220001	106.620003	91.160004	320.489990	167.860001	156.929993	77.129997	174.110001	0.0075	128.529999
05/25/2022	140.520004	262.519989	2116.790039	2116.100098	2135.500000	658.799988	462890	308.640015	183.830002	90.410004	...	131.440002	108.570000	92.650002	315.850006	170.009995	159.649994	77.239998	173.860001	0.0080	131.229996
05/26/2022	143.779999	265.899994	2165.919922	2155.850098	2221.550049	707.729980	468805	312.500000	191.630005	91.000000	...	132.740005	108.070000	98.750000	320.329987	174.130005	162.460007	77.589996	178.380005	0.0085	134.839996

606 rows × 100 columns

stock="AAPL.Close"
aapl=data[stock] # subset of one the stock
aapl.head()

date
01/02/2020    75.087502
01/03/2020    74.357498
01/06/2020    74.949997
01/07/2020    74.597504
01/08/2020    75.797501
Name: AAPL.Close, dtype: float64

This is the Efficient market hypothesis EMH for one stock

$\begin{array}{r} r e t = β_{0} + β_{1} r e t_{- 1} + β_{1} r e t_{- 2} + u (2) \end{array}$

where ret is the return of the stock, $r e t_{- 1}$ is the return of the lagged one period, and $r e t_{- 2}$ is the return of the lagged two periods

To run the previous model, we need to estimate the returns, create the variables and store them in a data frame. We estimate arithmetic return: (P/P(t-e))-1

ret=aapl.pct_change()
ret.head()

date
01/02/2020         NaN
01/03/2020   -0.009722
01/06/2020    0.007968
01/07/2020   -0.004703
01/08/2020    0.016086
Name: AAPL.Close, dtype: float64

Here we create the lags of the variables, and concatenate them on a data frame.

lag1=ret.shift(-1)
lag1=lag1.rename("lag_1") # This is the first lag  r(t-1)
lag2=ret.shift(-2)
lag2=lag2.rename("lag_2") # This is the second lag  r(t-1)o r(t-2)

Then, we concatenate the three variables.

all=pd.concat([ret,lag1,lag2],axis=1) 
all.head()

	AAPL.Close	lag_1	lag_2
date
01/02/2020	NaN	-0.009722	0.007968
01/03/2020	-0.009722	0.007968	-0.004703
01/06/2020	0.007968	-0.004703	0.016086
01/07/2020	-0.004703	0.016086	0.021241
01/08/2020	0.016086	0.021241	0.002261

Now we run the OLS model, but first we eliminate the missing values.

all.dropna(inplace=True) # Dropping  missing values
todas2=sm.add_constant(all) # Adding the constant to the model
y=todas2[stock] # defining the ariable y
X=todas2.loc[:,("const","lag_1","lag_2")] # Defining X, by select the independent variables
model=sm.OLS(y,X) # Runing the OLS model
res=model.fit() # Estimating the parameters of the OLS model
print(res.summary()) # printing the summary of the output

                            OLS Regression Results                            
==============================================================================
Dep. Variable:             AAPL.Close   R-squared:                       0.034
Model:                            OLS   Adj. R-squared:                  0.031
Method:                 Least Squares   F-statistic:                     10.65
Date:                Thu, 21 Nov 2024   Prob (F-statistic):           2.86e-05
Time:                        13:32:39   Log-Likelihood:                 1419.3
No. Observations:                 603   AIC:                            -2833.
Df Residuals:                     600   BIC:                            -2819.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0015      0.001      1.615      0.107      -0.000       0.003
lag_1         -0.1800      0.041     -4.410      0.000      -0.260      -0.100
lag_2          0.0214      0.041      0.525      0.600      -0.059       0.102
==============================================================================
Omnibus:                       54.729   Durbin-Watson:                   1.997
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              304.988
Skew:                           0.008   Prob(JB):                     5.93e-67
Kurtosis:                       6.484   Cond. No.                         47.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Remember, a significant beta1 coefficient would reject EMH. In this case, the coefficient of lag 1 is significant at 1%, and the coefficient of lag 2 is insignificant. For this example, we can use the previous day’s return on Apple to predict tomorrow’s return. However, we can not use the previous day and the two days before returns to predict tomorrow’s return. If, in this case, the two coefficients were significant, the market would not be efficient for Apple because we can use historical information to predict future returns. If both coefficients were insignificant, we could say that the market is efficient for Apple, and then that historical information to predict the future return would not be helpful. We mentioned that one advantage of this test is that it is easy to understand. However, like in the Apple case, if one coefficient is significant and the other is not, we have an inconclusive EMH test, so we need another EMH test that corrects that issue, which is what the next EMH test for variance does.

3.3 EMH test for variance for one asset

For some financial time series, such as stock returns, the expected returns do may not depend on past returns (the Market is efficient), but the variance of returns may. For example, in the model:

$r_{t} = β_{0} + β_{1} r_{t - 1} + β_{2} r_{t - 2} + u_{t} (3)$

We could apply a test to verify if the variance of returns have an effect o the returns:

$u_{t}^{2} = δ_{0} + δ_{1} r_{t - 1} + δ_{2} r_{t - 2} + e_{t} (3.1)$

The previous model is for heteroskedasticity test, then we could apply Breusch-Pagan (BP) test for heteroskedasticity. The intuition behind applying the heteroskedasticity to test EMH, is because, remember that the null hypothesis in a test such as BP in equation 3.1 is:

$δ_{0} = δ_{1} = δ_{2} = 0$

Then, for a small p-value of the BP test, we reject the null hypothesis and the model would be heteroscedastic. If that happens, in terms of the EMH it implies that there is evidence that the variance of the historic returns have an effect on the today return, and then we can use the historical information of the return to predict future returns.
Here we aply the BP test.

test = sms.het_breuschpagan(res.resid, res.model.exog)
test

(0.4445405267105208,
 0.8006989355087222,
 0.22132760713799005,
 0.8015194789663879)

We have four statistics: the Lagrange value and its t p-value and the F-value and its p.value. Both are usually consistent in their results. For this chapter, we will use the F-value. This is the F-value

test[2]

0.22132760713799005

This is the F-pvalue

test[3]

0.8015194789663879

In the previous output, we see, by the F-pvale, that we do not reject the null hypothesis, then the model is homoscedastic. Regarding the EMH, in this case, it would imply that there is evidence that the variance of the historic returns does not affect today’s return, and we can not use the historical information of the return to predict future returns. The implication for our portfolio is that we must exclude Apple from it.

3.4 EMH tests for the variance for many assets to build a portfolio

In this chapter, we filter those stocks so that we can make a prediction. Regarding the EMH test for variance, the stocks for which there is a small p-value on the BP test. For this chapter, small is a p-value less than 0.1.

The next code is to build a “Loop For” to perform the EMH test for variance for many assets. First, we get the ticker’s names.

tickers=list(data.columns) # tickers names
tickers[:5]

['AAPL.Close', 'MSFT.Close', 'GOOG.Close', 'GOOGL.Close', 'AMZN.Close']

pval=[] 
fstat=[]
names=[]
for stock in tickers:

  ap=data[[stock]]
  ret=ap.pct_change()
  lag1=ret.shift(-1)  
  lag1=lag1.rename(columns={stock:"lag_1"})

  lag2=ret.shift(-2)  
  lag2=lag2.rename(columns={stock:"lag_2"})

  ret2=pd.concat([ret[stock],lag1,lag2],axis=1)
  ret2.dropna(inplace=True)
  data3=sm.add_constant(ret2)
  y=data3[stock]
  X=data3.loc[:,("const","lag_1","lag_2")]
  model=sm.OLS(y,X)
  res=model.fit()
  test = sms.het_breuschpagan(res.resid, res.model.exog)
  pval.append(test[3]) # To store the  F-pvalue
  fstat.append(test[2])# To store the  F values
  names.append(stock)

We transform the results into a data frame.

df=pd.DataFrame({"pval":pval,"fstat":fstat},index=names)

df.head()

	pval	fstat
AAPL.Close	0.801519	0.221328
MSFT.Close	0.113536	2.183545
GOOG.Close	0.555914	0.587716
GOOGL.Close	0.771132	0.260009
AMZN.Close	0.626187	0.468472

Here we apply the filter to get the stocks whit a Fp-value less than 0.1.

df2=df[df["pval"]<=0.1]
df2.head()

	pval	fstat
TSLA.Close	0.001851	6.358248
TSM.Close	0.025103	3.707484
JNJ.Close	0.000003	12.999178
UNH.Close	0.039176	3.257248
JPM.Close	0.052720	2.957243

df2.tail()

	pval	fstat
NKE.Close	7.981519e-03	4.869728
INTC.Close	2.204589e-02	3.838985
C.PJ.Close	5.401396e-28	69.839166
TMUS.Close	6.114786e-03	5.140592
TXN.Close	3.416205e-03	5.733322

For the portfolio’s next stage, we kept the stocks inside the data frame df2. We will also need the prices and returns of those stocks, so we will build a data frame with their prices.

data2=data.loc[:,df2.index]
data2.head()

	TSLA.Close	TSM.Close	JNJ.Close	UNH.Close	JPM.Close	TCEHY.Close	TCTZF.Close	XOM.Close	BAC.Close	PG.Close	...	ACN.Close	CSCO.Close	LRLCF.Close	CICHF.Close	MCD.Close	NKE.Close	INTC.Close	C.PJ.Close	TMUS.Close	TXN.Close
date
01/02/2020	86.052002	60.040001	145.970001	292.500000	141.089996	49.880001	49.880001	70.900002	35.639999	123.410004	...	210.149994	48.419998	293.450012	0.87	200.789993	102.199997	60.840000	28.570000	78.589996	129.570007
01/03/2020	88.601997	58.060001	144.279999	289.540009	138.339996	49.029999	48.930000	70.330002	34.900002	122.580002	...	209.800003	47.630001	297.130005	0.84	200.080002	101.919998	60.099998	28.719999	78.169998	127.849998
01/06/2020	90.307999	57.389999	144.100006	291.549988	138.229996	48.770000	48.700001	70.870003	34.849998	122.750000	...	208.429993	47.799999	293.000000	0.84	202.330002	101.830002	59.930000	28.719999	78.620003	126.959999
01/07/2020	93.811996	58.320000	144.979996	289.790009	135.880005	49.779999	49.770000	70.290001	34.619999	121.989998	...	203.929993	47.490002	288.549988	0.88	202.630005	101.779999	58.930000	28.629999	78.919998	129.410004
01/08/2020	98.428001	58.750000	144.960007	295.899994	136.940002	49.650002	49.650002	69.230003	34.970001	122.510002	...	204.330002	47.520000	287.500000	0.88	205.910004	101.550003	58.970001	28.709999	79.419998	129.759995

5 rows × 56 columns