One of the fundamental problems in finance is to explain the cross-section differences that arise in the expected returns of assets. Previously we noted that investors should be compensated for taking on risk, and as such risky assets should be associated with higher expected rates of return. Hence, a large part of finance is concerned with identifying the *risk factors* that can explain the observed differences in *expected returns*.

This particular problem is not trivial as there is no evidence to suggest that the returns provided by firms that have experienced high historical standard deviations would have also provided high rates of return. In fact, research by Ang et al. (2006) and Ang et al. (2009) shows just the opposite, where stocks with high idiosyncratic standard deviations have provided low average returns, both in the United States and in twenty-three other countries.

When considering the relationship between risk and return, we note that the Capital Asset Pricing Model (CAPM) asserts that expected returns are linearly related to a single systematic source that is represented by the risk of the market portfolio, where the “*beta*” refers to the relative measure of market portfolio risk. This would suggest that the cost of equity capital should reflect a risk premium that compensates the firm’s investors for the systematic risk present in the investment.

If we were to consider that the CAPM may provide incorrect results then managers who make use of the CAPM when it overstates the market’s required rates of return will forgo some profitable projects that have a true positive net present value. Eventually, the stock market will discipline these conservative managers by viewing them as under-performers. Conversely, if the CAPM understates project risk premiums, managers using the CAPM will undertake some projects that have a negative net present value, which will destroy shareholders’ wealth. In this case the market will discipline these overly aggressive managers for their under-performance relative to what shareholders demand.

Given that the CAPM may be incorrect and that recent empirical tests have not been kind to the CAPM, it would be worthwhile to consider an alternative model to compute the cost of capital. The most popular of these alternatives is presented by Arbitrage Pricing Theory and Factor Models.

Arbitrage Pricing Theory (APT) acknowledges that the return on the market portfolio may not be the only potential source of systematic risk that affects the returns on equities. This theory was originally proposed by Ross (1976) and suggests that other economy-wide factors could also systematically affect the returns for a large number of securities.^{1} These factors might include news about inflation, interest rates, gross domestic product (GDP), or the unemployment rate. Changes in these factors may affect future corporate profitability, which could result in a change in the measurement of risk and the way that investors discount future cash flows from a particular entity.

Hence, APT asserts that the expected returns of a security are linearly related to \(K\) systematic factors and the exposure to these factors is measured by factor “*beta*”. That is,

where \(\tilde{r}_{i}\) are random variables for the return of securities that are indexed by \(i\), \(\beta_{ik,t}\) is the beta or risk exposure on the \(k\)-th factor, and \(\gamma_{k}\) is the factor risk premium, for \(k=1,2, \ldots,K\). When \(K=1\) and \(f_{1}\) is the market portfolio factor, the APT is equivalent to the CAPM. Similarly, if the first asset has a unit beta risk on the second factor, that is \(\beta_{i2}=1\), and zero betas for all other factors, then its expected return will be \(\mathbb{E}_t[r_{1}]=r_{f}+\gamma_{2}\) according to equation (1.1). In this case the risk-free rate would need to increase by an extra amount of return to compensate the investor for taking on the factor risk, \(f_{2}\). This is why \(\gamma_{2}\) is called the *factor risk premium* on \(f_{2}\), or the extra return one earns by taking one unit of beta risk that pertains to a particular factor. Interpretations of the other \(\gamma\)’s follow the same convention.

It is worth noting that there are a number of key differences in the assumptions of the CAPM and the APT. Previously we noted that the CAPM is based on mean-variance theory and assumes that only means and variances (whose calculations include covariances) matter in the portfolio choice. This requires restrictions on either the statistical distribution of the asset returns, such as normality, or restrictions on the form of the utility function, such as the quadratic form. In contrast, the APT does not impose such strong distributional assumptions as it would only assume that the asset returns are affected by a few factors. Moreover, it does not impose any restrictions on the form of the utility function, apart from the trivial assumption that investors prefer more of a good thing.

In the CAPM, all investors are informed about the true means, variances, and covariances of the asset returns, and they all use Markowitz portfolio theory to make their optimal investment decisions. As a result, all investors hold the same market portfolio of risky assets and differ only, depending on their risk aversions, in the allocation of their total wealth between the risk-free asset and the market portfolio. In contrast, the APT assumes that it is sufficient for only some investors to be able to take advantage of arbitrage opportunities. If the assets are miss-priced so that the expected returns of these assets deviate from what is accounted for by their beta risk, smart investors can construct an arbitrage portfolio to make abnormal returns. In a competitive market, we assume that there are no “free lunch” arbitrage opportunities; as they would be exploited in an expedient manner by smart investors. Hence, the assets in the economy must only be rewarded by their beta risk exposures, and therefore the APT holds. Higher expected returns on particular asset are associated with higher systematic risks.

Technically, the APT may be implemented with a \(K\)-factor model. That is, the asset returns are influenced by \(K\) factors in the economy through the linear regression model,

\[\begin{eqnarray} \tilde{r}_{i,t}-r_{f,t}=\alpha_{i}+\beta_{i1}\tilde{f}_{1,t}+\cdots+\beta_{iK}\tilde{f}_{K,t}+\tilde{\varepsilon}_{i,t} \tag{1.2} \end{eqnarray}\]where \(\tilde{f}_{1},\tilde{f}_{2}, \ldots,\tilde{f}_{K}\) are the systematic factors that affect all the asset returns on the left-hand side, \(i=\{1,2, \ldots, N\}\); and \(\tilde{\varepsilon}_{i,t}\) refers to the asset-specific risk. Note that we have placed a tilde sign over the random asset returns, factors, and specific risks. If the tilde is present, it means the factors are random variables, whereas the same notation without the tilde refers to the realisations of the respective variables (i.e. the observed data).

Theoretically, under the assumption of no arbitrage, the asset pricing relation of the APT as given by equation (1.1) must be true. Before embarking on the proof of the APT, we note two important points. First, the return-generating process, equation (1.2), is fundamentally different from the asset pricing relation. The return-generating process is a statistical model used to measure the risk exposures of the asset returns. It does not require drawing an economic conclusion, nor does it say anything about what the expected returns on the assets should be. In other words, the alphas \((\alpha_{i}\)’s\()\), in the return-generating process could take on any values, as may be derived from the estimation procedure. Only when the no-arbitrage assumption is imposed can one claim the APT, which says that the alphas should be linearly related to their risk exposures (betas).

The APT does not provide any specific information about what the specific factors are (or should be), and it does not make any claim regarding the number of factors that should be included. It simply assumes that the returns are driven by the factors, and if the smart investors know the betas (via learning or estimation), then an arbitrage portfolio, which requires no investment but yields a positive return, can be formed if the APT-pricing relation is violated in the market. Hence, in equilibrium if there are no arbitrage opportunities, we should not observe deviations from the APT-pricing relation. In contrast, the CAPM states that a single factor for the market should be included in the model and there should be no arbitrage profits.

To observe the intuition that supports the APT, consider a one-factor model in which there are two assets and assume that the factor model has no errors. In this case, we could have

\[\begin{eqnarray*} \tilde{r}_{1}=\mu_{1}+0.8\tilde{f} \end{eqnarray*}\]and

\[\begin{eqnarray*} \tilde{r}_{2}=\mu_{2}+1.6\tilde{f} \end{eqnarray*}\]where \(\tilde{f}\) is the single factor that affects the return on both assets with \(\mathbb{E}(\tilde{f})=0\), while \(\mu_{1}\) and \(\mu_{2}\) are the expected asset returns for assets 1 and 2, respectively. For example, for asset 1:

\[\begin{eqnarray*} \mathbb{E}(\tilde{r}_{1})=\mu_{1}+0.8 \; \mathbb{E}(\tilde{f})=\mu_{1}+0.8\times 0=\mu_{1} \end{eqnarray*}\]The question that then remains is to work out how \(\mu_{1}\) and \(\mu_{2}\) are related to their betas, 0.8 and 1.6.

With two assets and one factor, a suitable portfolio consisting of the two assets can eliminate the single factor risk. With the current beta values, 0.8 and 1.6, one such portfolio, which we call portfolio \(z\), is:

\[\begin{eqnarray*} \tilde{z}=2\tilde{r}_{1}-\tilde{r}_{2}=2\mu_{1}-\mu_{2} \end{eqnarray*}\]This portfolio is risk free with a constant return \(2\mu_{1}-\mu_{2}\). If we then assume that there is a risk-free asset in the market with rate of return \(r_{f}\) In this case an *arbitrage opportunity* exists if two risk-free investments have different returns. In this illustration, lets suppose that \(r_{f}\) is 5%. Portfolio \(z\) is risk-free so its return should also be 5%. Suppose instead that portfolio \(z\) provides a 6% return. That is, \(2\mu_{1}-\mu_{2}=6\%\). If this situation existed in the market, then investors can borrow funds at the risk-free rate of 5% and invest the borrowed funds in the risk-free asset portfolio, to earn a return of 6%. This would result in a 1% arbitrage profit, without any initial investment.

Similarly, if we were to assume that portfolio \(z\) provides a return of 4%, (i.e. \(2\mu_{1}-\mu_{2}=4\%\)). In this case, investors can short the asset portfolio and invest the proceeds in the risk-free asset, which provides a 5% return where we would once again make a 1% arbitrage profit without any initial investment. In either of these cases, we are able to derive an arbitrage profit that is unlikely to be available in a competitive market. Hence, in asset pricing theory, we typically make the assumption that there are no such arbitrage opportunities to earn a certain profit with no risk. The reason for this is that if they were to arise, a group of smart investors would make use of such an opportunity and their activity would result in a change in the price of the respective assets to eliminate the risk-free profit.

Since the risk-free portfolio \(z\) should have the same return as the risk-free asset, we have:

\[\begin{eqnarray*} 2\mu_{1}-\mu_{2}=r_{f} \end{eqnarray*}\]which can be rewritten as:

\[\begin{eqnarray} \frac{\mu_{1}-r_{f}}{0.8}=\frac{\mu_{2}-r_{f}}{1.6} \tag{2.1} \end{eqnarray}\]This expression suggests that the ratio of the excess expected returns on the two assets divided by their betas should be equal. If we were then to use \(\lambda_{1}\) to represent the proportional coefficient, then equation (2.1) implies:

\[\begin{eqnarray*} \mu_{1} &=& r_{f}+0.8\lambda_{1} \\ \mu_{2} &=& r_{f}+1.6\lambda_{1} \end{eqnarray*}\]Similarly, the same beta pricing relation holds for any asset; that is,

\[\begin{eqnarray} \mu_{i}=r_{f}+\beta_{i}\lambda_{1} \tag{2.2} \end{eqnarray}\]where \(\mu_{i}\) is the expected return on a risky asset with beta risk \(\beta_{i}\) (as long as the one-factor model assumption, \(\tilde{r}_{t}=\mu_{i}+\beta_{i}\tilde{f}\), is true, and there is no arbitrage). Equation (2.2) suggests that the expected excess asset returns are proportional to their beta risks, which is the claim of the APT in the one-factor case. Hence, equation (2.2) resembles the CAPM, despite the fact that it has been derived under assumptions that differ to those of the CAPM.

Consider now the \(K\)-factor model, where again we can start with equation (1.2),

\[\begin{eqnarray} \tilde{r}_{i,t}-r_{f,t}=\mu_{i}+\beta_{i1}\tilde{f}_{1,t}+\cdots+\beta_{iK}\tilde{f}_{K,t}+\tilde{\varepsilon}_{i,t} \tag{3.1} \end{eqnarray}\]where we assume that the factors have zero means (or the means are subtracted from the factors, as often done in the context of the APT factor model), so that the expected excess asset returns are given by the \(\mu_{i}\)’s. Now we want to examine under what conditions the pricing relation:

\[\begin{eqnarray} \mathbb{E}[\tilde{r}_{i}]=\mu_{i}=r_{f}+\beta_{i1}\gamma_{1}+\cdots+\beta_{iK}\gamma_{K} \tag{3.2} \end{eqnarray}\]is true. In contrast to the special one-factor model, here we have \(K\) risk factors and also the presence of the unsystematic risk \(\tilde{\varepsilon}_{i,t}\) (i.e., noise).

Nevertheless, the argument for beta pricing in the one-factor case can be extended to the \(K\)-factor model. To see this, consider again the previous one-factor example. For any portfolio weights \(w_{1}\) and \(w_{2}\),

\[\begin{eqnarray} w_{1}+w_{2}=1 \tag{3.3} \end{eqnarray}\]that satisfy the condition of no risk:

\[\begin{eqnarray} w_{1}\beta_{1}+w_{2}\beta_{2}=0 \tag{3.4} \end{eqnarray}\]in our example, \(\beta_{1}=0.8\) and \(\beta_{2}=1.6)\), the portfolio:

\[\begin{eqnarray*} \tilde{z}=w_{1}\tilde{r}_{1}+w_{2}\tilde{r}_{2} \end{eqnarray*}\]will be riskless as before. Hence, it has the risk-free rate of return so that,

\[\begin{eqnarray*} w_{1}\mu_{1}+w_{2}\mu_{2}=r_{f} \end{eqnarray*}\]Using this portfolio condition, equation (3.3) can be rewritten as:

\[\begin{eqnarray} w_{1}(\mu_{1}-r_{f})+w_{2}(\mu_{2}-r_{f})=0 \tag{3.5} \end{eqnarray}\]Note that any portfolio satisfying equation (3.4) is a portfolio that eliminates factor risk. In terms of linear algebra, equation (3.4) says that the portfolio vector is orthogonal to the beta vector [that is, their inner product as expressed by equation (3.5) is zero]. Equation (3.5) says that any vector orthogonal to the beta vector will also be orthogonal to the expected excess return vector. This can only be true if the expected excess return vector can be generated by the betas, that is, the expected excess return vector must be a linear function of the betas. The result is the earlier equation (2.2). What we just provided here is the general reasoning that is applicable to the \(K\)-factor case.

To see why the beta-pricing relation remains true in the \(N\) assets case, consider the portfolio:

\[\begin{eqnarray} \nonumber \tilde{r}_{p}&=&\sum_{i=1}^{N}w_{i}\tilde{r}_{i} \\ &=& \left( \sum_{i=1}^{N}w_{i}\beta_{i1} \right)\tilde{f}_{1t}+\cdot\ \cdot\ \cdot+\left(\sum_{i=1}^{N}w_{i}\beta_{iK}\right)\tilde{f}_{Kt}+ \sum_{i=1}^{N}w_{i}\tilde{\varepsilon}_{i,t} \tag{3.6} \end{eqnarray}\]Suppose now that the portfolio weights are orthogonal to all the betas,

\[\begin{eqnarray} \sum_{i=1}^{N}w_{i}\beta_{ik}=w_{1}\beta_{1k}+w_{2}\beta_{2k}+\cdots+w_{N}\beta_{Nk}=0, \;\;\; k=1,2, \ldots,\ K \tag{3.7} \end{eqnarray}\]Then the portfolio has only unsystematic risk by equation (3.6),

\[\begin{eqnarray} \tilde{r}_{p}=\sum_{i=1}^{N}w_{i}\tilde{\varepsilon}_{i,t} \tag{3.8} \end{eqnarray}\]Under certain conditions, this risk will be approximately zero. To see why, assume all the unsystematic risks are independent across assets and they have the same variance \(\sigma_{u}^{2}\). Then the variance of \(\tilde{r}_{p}\) is:

\[\begin{eqnarray} \sigma^{2}(\tilde{r}_{p})=(w_{1}^{2}+w_{2}^{2}+\cdots+w_{N}^{2})\sigma_{u}^{2} \tag{3.9} \end{eqnarray}\]When there are a large number of assets, the weights can be roughly of the same magnitude of \(1/N\), and hence \(\sigma^{2}(\tilde{r}_{p})\) should be of magnitude of:

\[\begin{eqnarray*} (1/N)^{2}\times N\times\sigma_{u}^{2}=\sigma_{u}^{2}/N \end{eqnarray*}\]which approaches zero as \(N\) approaches infinity. This implies that \(\tilde{r}_{p}\) is almost risk free. Hence, in the absence of arbitrage, the return on \(\tilde{r}_{p}\) should be approximately \(r_{f}\), or

\[\begin{eqnarray} w_{1}(\mu_{1}-r_{f})+w_{2}(\mu_{2}-r_{f})+\cdots+w_{K}(\mu_{K}-r_{f})\approx 0 \tag{3.10} \end{eqnarray}\]As in the one-factor case, this implies an approximate beta-pricing relation. Under some additional assumptions, such as a finite upper bound on the variances of all the risky assets and some property on the utility, the approximate relation can be exact as given by equation (3.2).

From a practical point of view, the APT is abstract in that it does not tell where the investors should go to find the factors and estimate the associated beta risks. On the other hand, this is precisely the generality of the theory. When investing in a group of assets, say in certain sectors or industries, the theory allows investment managers to identify important factors that affect the assets of interest, and to examine whether there are any miss-pricing opportunities.

More importantly, factor models are widely used in practice as a tool for estimating expected asset returns and their covariance matrix, regardless of the validity of the APT. This is because if market participants can identify those true factors that drive asset returns, they will have much better estimates of the true expected asset returns and the covariance matrix, and hence can form a much better portfolio than otherwise possible. Hence, there is considerable research devoted to analysing factor models in practice by the investment community. There is an intellectual “*arms race*”" to find the best portfolio strategies to outperform competitors. In addition, factor models can be used not only for explaining asset returns, but also for predicting future returns. Factor model estimation depends crucially on whether the factors are identified (known) or unidentified (latent), and depends further on the sample size and the number of assets. In this section, we review the factor models in the case of known and latent factors in order to provide an overview.

When there are economy-wide factors that affect the returns on a large number of firms, the influences of these factors on the return to a well-diversified portfolio are still present. The influences of the factors cannot be diversified away. Consequently, the risk premiums on particular securities are determined by the sensitivities of their returns to the economy-wide factors and by the compensations that investors require because of the presence of each of these different risks. To determine these factor risk premiums, researchers construct factor-mimicking portfolios - portfolios that correlate very highly (ideally perfectly) with the economic factors.

The simplest case of factor models is where the \(K\) factors are assumed known or observable, so that we have time-series data on them. In this case, the \(K\)-factor model for the return-generating process,

\[\begin{eqnarray} \tilde{r}_{i,t}-r_{f,t}=\alpha_{i}+\beta_{i1}\tilde{f}_{1t}+\cdots+\beta_{iK}\tilde{f}_{Kt}+\tilde{\varepsilon}_{i,t} \tag{4.1} \end{eqnarray}\]is a multiple regression for each asset, and is a multivariate regression if all of the individual regressions are pooled together. For example, if one believes that the gross domestic product (GDP) is the driving force for a group of stock returns, one would have a one-factor model,

\[\begin{eqnarray*} \tilde{r}_{i,t}-r_{f,t}=\alpha_{i}+\beta_{i1}\tilde{GDP}_{t}+\tilde{\varepsilon}_{i,t} \end{eqnarray*}\]The above equation corresponds to equation (4.1) with \(K=1\) and \(f_1=\tilde{GDP}\). In practice, one can obtain time-series data on both the asset returns and GDP, and then one can run regressions to obtain all the parameters, including in particular the expected returns. Another popular one-factor model is the market model regression:

\[\begin{eqnarray*} \tilde{r}_{i,t}-r_{f,t}=\alpha_{i}+\beta_{i1}(\tilde{r}_{m,t}-r_{f,t})+\tilde{\varepsilon}_{i,t} \end{eqnarray*}\]where \(\tilde{r}_{m,t}\) is the return on a stock market index.

To understand the covariance matrix estimation, it will be useful to write the \(K\)-factor model in matrix form:

\[\begin{eqnarray} \tilde{R}_{t}=\alpha+\beta\tilde{f}_{t}+\tilde{\varepsilon}_{t} \tag{4.2} \end{eqnarray}\]or

\[\begin{eqnarray*} \left[\begin{array}{l} \tilde{r}_{1,t}\\ \vdots\\ \tilde{r}_{N,t} \end{array}\right]=\left[\begin{array}{l} \alpha_{1}\\ \vdots\\ \alpha_{N} \end{array}\right]+\left[\begin{array}{ccc} \beta_{11} & \cdots & \beta_{1K}\\ \vdots & \ddots & \vdots\\ \beta_{N1} & \cdots & \beta_{NK} \end{array}\right]\left[\begin{array}{l} \tilde{f}_{1,t}\\ \vdots\\ \tilde{f}_{K,t} \end{array}\right]+\left[\begin{array}{l} \tilde{\varepsilon}_{1,t}\\ \vdots\\ \tilde{\varepsilon}_{N,t} \end{array}\right] \end{eqnarray*}\]where

- \(\tilde{R}_{t}=\) an \(N\)-vector of asset excess returns
- \(\alpha=\) an \(N\)-vector of the alphas
- \(\beta=\) an \(N\times K\) of betas or factor loadings
- \(\tilde{f}_{t}=\) a \(K\)-vector of the factors
- \(\tilde{\varepsilon}_t=\) an \(N\)-vector of the model residuals

For example, we can write a model with \(N=3\) assets and \(K=2\) factors as:

\[\begin{eqnarray*} \left[\begin{array}{l} \tilde{r}_{1,t}\\ \tilde{r}_{2,t}\\ \tilde{r}_{3,t} \end{array}\right]=\left[\begin{array}{l} \alpha_{1}\\ \alpha_{2}\\ \alpha_{3} \end{array}\right]+\left[\begin{array}{ll} \beta_{11} & \beta_{12}\\ \beta_{21} & \beta_{22}\\ \beta_{31} & \beta_{32} \end{array}\right]\left[\begin{array}{l} \tilde{f}_{1,t}\\ \tilde{f}_{2,t} \end{array}\right]+\left[\begin{array}{l} \tilde{\varepsilon}_{1,t}\\ \tilde{\varepsilon}_{2,t}\\ \tilde{\varepsilon}_{3,t} \end{array}\right] \end{eqnarray*}\]Taking the covariance of both sides of equation (4.2), we obtain the return covariance matrix:

\[\begin{eqnarray} \Sigma=\beta'\Sigma_f \beta+\Sigma_{\varepsilon} \tag{4.3} \end{eqnarray}\]where \(\Sigma_f\) is the covariance matrix of the factors, and \(\Sigma_{\varepsilon}\) is the covariance matrix of the residuals. The matrix \(\Sigma_f\) can be estimated by using the sample covariance matrix from the historical returns. This works for \(\Sigma_{\varepsilon}\) too if \(N\) is small relative to \(T\). However, when \(N\) is large relative to \(T\) the sample covariance matrix of the residuals will be poorly behaved.

Usually an additional assumption that the residuals are uncorrelated is imposed, so that \(\Sigma_{\varepsilon}\) becomes a diagonal matrix, and can then be estimated by using the sample variances of the residuals. Plugging the estimates of all the parameters into the right-hand side of equation (4.3), we obtain the covariance matrix needed for applying the mean-variance portfolio analysis.

In the estimation of a multifactor model, it is implicitly assumed that the number of time series observations \(T\) is far greater than \(K\), the number of factors. Otherwise, the regressions will perform poorly. For the case in which \(K\) is close to or larger than \(T\) some special treatments are needed.

Fama and French (1992) questioned the ability of the traditional CAPM to explain the cross-section of stock returns in U.S. data. They found that the market value of a firm’s market equity (ME), which is its price per share multiplied by the number of shares outstanding (or the firm’s market capitalization), and the ratio of the accounting book value of a firm to its market value (book equity to market equity [BE/ME]) contribute significantly to the explanation of average stock returns.^{2}

During their sample, average returns on firms with small market capitalisations were higher than could be explained by their betas with the market portfolio. Perhaps small firms suffer from a greater lack of communication between the firm’s managers and its investors. This asymmetric information could lead investors to require higher rates of return from small firms. Firms that have high ratios of the book value of their equity to the market value of their equity (so-called value firms) also have higher average returns than can be explained by the CAPM and have outperformed growth stocks (stocks with a low [BE/ME]). Interestingly, these firms often suffer from financial distress. If financial distress tends to systematically occur when investors are more risk averse or face bad times, it may cause investors to demand a risk premium for bearing this risk.

Fama and French’s findings are still the subject of great debate in the economic literature, and not everyone believes the results will hold up to further scrutiny. First, many mutual fund companies offer value funds and small-cap funds, which invest in high book-to-market stocks and small-capitalization stocks, respectively. Hence, individual investors can easily diversify their portfolios along size and value characteristics. Second, Ang and Chen (2007) found little evidence of a value effect in a larger sample than the one used by Fama and French (1992), and several other authors have suggested that the size effect disappeared in the 1980s.

Based on their empirical findings, Fama and French (1995) developed a three-factor model to explain average equity returns. The first factor is the return on the value-weighted market portfolio in excess of the risk-free return, as in the CAPM. The second factor is the difference in the return on a portfolio of small firms and the return on a portfolio of big firms [small minus big (SMB)], in which the ratio of [BE/ME] is held constant in each portfolio. The third factor is the difference between the return on a portfolio of firms with high values of [BE/ME] and the return on a portfolio of firms with low values of [BE/ME] (high minus low [HML]), in which the size of firms is held constant in each portfolio. To find the sensitivities of a firm’s equity return to the three factors, you merely run a regression, just as you do to find the beta in the CAPM. The difference is that now there are three explanatory variables instead of one. The average rates of return on the factor-mimicking portfolios can then be combined with the estimated sensitivities of the equity return to the returns on the factor-mimicking portfolios to provide an estimate of the required rate of return on the equity.

When Fama and French (1998) applied their model to international data,^{3} they found that two factors - the return on the world market and a global version of the HML factor - sufficed to explain the cross-section of expected returns in 13 countries.

**Example**: The Cost of Equity Capital in the Fama-French Model

Suppose we want to estimate the cost of capital for a firm in Australia that has the same systematic risk as a portfolio of Australian stocks with high book-to-market levels. In Fama and French (1998), we find the following estimates:

If the current risk-free interest rate is 5%, and the world market equity risk premium is 5.93%, from the capital asset pricing model, the required rate of return for the Australian firm from the CAPM is

\[\begin{eqnarray*} r_{\mathsf{AUS}}=5\% +(0.84\times 5.93\%)= 9.98\% \end{eqnarray*}\]We estimate the premium on the value factor-mimicking portfolio to be 3%. Therefore, the required equity rate of return implied by the Fama-French two-factor model is

\[\begin{eqnarray*} r_{\mathsf{AUS}}=5\% +(0.90\times 5.93\%)+(0.59 \times 3.00\% ) = 12.11\% \end{eqnarray*}\]Notice that the two estimates of the required rate of return on the stock are very different. This is true because value firms in Australia have historically provided higher average rates of return than the CAPM would imply. Although the Fama-French model has become quite popular, it remains an empirical model, not grounded in formal theory. With remaining doubts about the validity of the model and no good story for why the value effect would persist, the Fama-French model has not yet been widely adopted in practice.

Other prominent multifactor models where known factors are used include

- MSCI Barra fundamental factor model
- Burmeister-Ibbotson-Roll-Ross (BIRR) macroeconomic factor model
- Barclay Group Inc. factor model (which pertains to the bond market)

While some applications use observed factors, some use entirely latent factors, that is, they take the view that the factors \(f_{t}\) in the \(K\)-factor model:

\[\begin{eqnarray*} \tilde{R}_{t}=\alpha+\beta\tilde{f}_{t}+\tilde{\varepsilon}_{i,t} \end{eqnarray*}\]are not directly observable. An argument for the use of latent factors is that the observed factors may be measured with errors or have been already anticipated by investors. Without imposing what \(f_{t}\) are from our likely incorrect belief, we can statistically estimate the factors based on the factor model and data.

It is important to understand that the field of statistics incorporates a methodology known as “*factor analysis*” in which the model employed is referred to as a factor model. These are not, of course, the same factors as we have been discussing, but rather represent basic underlying causes referred to here as latent factors. The latent factor models serve two main purposes: (1) they reduce the dimensionality of models to make estimation possible and/or (2) they find the likely true causes that drive data.

In what follows we seeks to describe some of the properties of the latent factor model. First, the factors are not uniquely defined in the model, but all sets of factors are linear combinations of each other. This is because if \(\tilde{f}_{t}\) is a set of factors, then, for any \(K\times K\) invertible matrix \(A\), we have:

\[\begin{eqnarray} \tilde{R}_{t}=\alpha+\beta\tilde{f_{t}}+\tilde{\varepsilon}_{i,t}=\alpha+(\beta A^{-1})(A\tilde{f}_{t})+\tilde{\varepsilon}_{i,t} \tag{4.4} \end{eqnarray}\]which says that if \(\tilde{f}_{t}\) with regression coefficients \(\beta\) (known as a *factor loading* in the context of factor models) explains well the asset returns, so does \(\tilde{f}_{t}^\star=A\tilde{f}_{t}\) with loadings \(\beta A^{-1}\). The linear transformation of \(\tilde{f}_{t},\tilde{f}_{t}^\star\), is also known as a *rotation* of \(f_{t}\).

The second property is that we can assume all the factors have zero mean, that is, \(\mathbb{E} [\tilde{f}_{t}]=0\). This is because if \(\mu_{f}=\mathbb{E}[f_{t}]\), then the factor model can be written as:

\[\begin{eqnarray} \tilde{R}_{t}=\alpha+\beta\tilde{f}_{t}+\tilde{\varepsilon}_{t}=(\alpha-\beta\mu_{f})+\beta(\tilde{f}_{t}-\mu_{f})+\tilde{\varepsilon}_{t} \tag{4.5} \end{eqnarray}\]If we rename \(\alpha-\beta\mu_{f}\) as the new alphas, and \(f_{t}-\mu_{f}\) as the new factors, then the new factors will have zero means, and the new factor model is statistically the same as the old one. Hence, without loss of generality, we will assume that the mean of the factors are zeros in our estimation.

Note that the return covariance matrix formula, equation (4.3) and repeated here as

\[\begin{eqnarray} \Sigma=\beta'\Sigma_f\beta+\Sigma_{\varepsilon} \tag{4.6} \end{eqnarray}\]holds regardless of whether the factors are observable or latent. However, through factor rotation, we can make a new set of factors so as to have the identity covariance matrix. In this case with \(\Sigma_f=I_{K}\), we say that the factor model is *standardised*, and the covariance equation then simply becomes:

In general, \(\Sigma_{\varepsilon}\) can have non-zero off-diagonal elements, implying that the residuals are correlated. If we assume that the residuals are uncorrelated, then \(\Sigma_{\varepsilon}\) becomes a diagonal matrix, and the factor model is known as a *strict factor model*. If we assume further that \(\Sigma_{\varepsilon}\) has equal diagonal elements, that is, \(\Sigma_{\varepsilon}=\sigma^{2}I_{N}\) for some \(\sigma>0\) with \(I_{N}\) an \(N\) dimensional identity matrix, then the factor model is known as a *normal factor model*.

Rather than taking the view that there are only observable factors or only latent factors, we can consider a more general factor model with both views,

\[\begin{eqnarray} \tilde{R}_{t}=\alpha+\beta\tilde{f_{t}}+\beta_{g}\tilde{g}_{t}+\tilde{\varepsilon}_{t} \tag{4.8} \end{eqnarray}\]where \(\tilde{f}_{t}\) is a \(K\)-vector of latent factors, \(\tilde{g}_{t}\) is an \(L\)-vector of observable factors, and \(\beta_{g}\) are the betas associated with \(\tilde{g}_{t}\). This model makes intuitive sense. If we believe a few fundamental and macroeconomic factors are the driving forces, we will use them to create the \(\tilde{g}_{t}\) vector. Since we may not account for all the possible factors, we need to add additional \(K\) unknown factors, which are to be estimated from the data.

The estimation of the above factor model given by equation (4.8) usually involves two steps. In the first step, a regression of the asset returns on the known factors is run in order to obtain \(\hat{\beta}_{g}\), an estimate of \(\beta_{g}\). This allows us to compute the residuals,

\[\begin{eqnarray} \hat{u}_{t}=R_{t}-\hat{\beta}_{g}g_{t} \tag{4.9} \end{eqnarray}\]that is, the difference of the asset returns from their fitted values by using the observed factors for all the time periods. Then, in the second step, a factor estimation approach is used to estimate the latent factors for \(\hat{u}_{t}\),

\[\begin{eqnarray} \tilde{u}_{t}=\alpha+\tilde{\beta}f_{t}+\tilde{\upsilon}_{t} \tag{4.10} \end{eqnarray}\]where \(\tilde{u}_{t}\) is the random differences whose realised values are \(\hat{u}_{t}\). The estimation method for this model is the same as estimating a latent factor model. With the factor estimates, we can treat the latent factors as known, and then use equation (4.8) to determine the expected asset returns and covariance matrix.

An important feature of factor models is that they use time \(t\) factors to explain time \(t\) returns. This is to estimate the long-run risk exposures of the assets, which are useful for both risk control and portfolio construction. On the other hand, portfolio managers are also very concerned about time-varying expected returns. In this case, they often use a predictive factor model such as the following to forecast the returns:

\[\begin{eqnarray} \tilde{R}_{t+1}=\alpha+\beta\tilde{f_{t}}+\beta_{g}\tilde{g}_{t}+\tilde{\varepsilon}_{t} \tag{4.11} \end{eqnarray}\]where as before \(\tilde{f}_{t}\) and \(\tilde{g}_{t}\) are the latent and observable factors, respectively. The single difference is that the earlier \(\tilde{R}_{t}\) is now replaced by \(\tilde{R}_{t+1}\). Equation (4.11) uses time \(t\) factors to forecast future return \(\tilde{R}_{t+1}\).

Computationally, the estimation of the predictive factor model is the same as for estimating the standard factor models. However, it should be emphasized that the regression \(R^{2}\), a measure of model fitting, is usually very good in the explanatory factor models. For example, when estimating a regression for the market return with a diversified portfolio the coefficient of determination would usually indicate that we are able to provide a result with strong explanatory power. In contrast, if a predictive factor model is used to forecast the expected returns of various assets, the \(R^{2}\) rarely exceeds 2%. This simply reflects the fact that assets returns are extremely difficult to predict in the real world. For example, Rapach, Strauss, and Zhou (2013) find that the \(R^{2}\) are mostly less than 1% when forecasting industry returns using a variety of past economic variables and past industry returns.

While the CAPM is the dominant model to determine the cost of capital the use of APT and Factor models are also popular. Fama and French (1992), Fama and French (1995), and Fama and French (1998) have proposed various specifications that make use of factor model structures. In addition to the market portfolio, the Fama-French factors measure the exposure of a stock to a portfolio going long in small stocks and short in large stocks and the stock’s exposure to a portfolio long in high book-to-market stocks (value stocks) and short in low book-to-market stocks (growth stocks). One of the findings of this area of research suggests that there is some weak empirical evidence that small stocks and value stocks have outperformed large stocks and growth stocks.

Ang, Andrew, and Joseph Chen. 2007. “CAPM over the Long Run: 1926-2001.” *Journal of Empirical Finance* 14 (1): 1–40.

Ang, Andrew, Robert J. Hodrick, Yuhang Xing, and Xiaoyan Zhang. 2006. “The Cross-Section of Volatility and Expected Returns.” *Journal of Finance* 61 (1): 259–99.

———. 2009. “High Idiosyncratic Volatility and Low Returns: International and Further U.S. Evidence.” *Journal of Financial Economics* 91 (1): 1–23.

Fama, Eugene F., and Kenneth R. French. 1992. “The Cross-Section of Expected Stock Returns.” *Journal of Finance* 47 (2): 427–65.

———. 1995. “Size and Book-to-Market Factors in Earnings and Returns.” *Journal of Finance* 50 (1): 131–55.

———. 1998. “Value Versus Growth: The International Evidence.” *Journal of Finance* 53 (6): 1975–99.

Rapach, David E., Jack K. Strauss, and Guofu Zhou. 2013. “International Stock Return Predictability: What Is the Role of the United States?” *The Journal of Finance* 68 (4). Wiley: 1633–62.

Ross, Stephen A. 1976. “The Arbitrage Theory of Capital Asset Pricing.” *Journal of Economic Theory* 13 (3): 341–60.

Ross, Stephen A., Randolph W. Westerfield, and Jeffrey F. Jaffee. 2002. *Corporate Finance*. Vol. 6th. Boston: McGraw-Hill-Irwin.

For an introduction to the APT, see Chapter 11 of Ross, Westerfield, and Jaffee (2002).↩

Although firms with higher betas tend to have higher average returns, Fama & French argue that the ability of beta to explain the cross-section of average stock returns is nil when the size of the firm’s market equity and ratio of book equity to market equity are included as explanatory variables.↩

It must be said that the empirical evidence against the CAPM was marginal at best in most countries, with the exception of the United States. Nevertheless, the new proposed model clearly improved the fit with the data.↩