
![]() |
Avoiding Biases in TAA
Using The BARRA Altis System |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Length | 24 months | 48 months | 60 months | |||
|---|---|---|---|---|---|---|
| Sample | IN | OUT | IN | OUT | IN | OUT |
| START DATE | 9/92 | 9/94 | 9/90 | 9/94 | 9/89 | 9/94 |
| END DATE | 8/94 | 8/96 | 8/94 | 8/96 | 8/94 | 8/96 |
| Country: | ||||||
| USA-SP500 | 9.4% | 0.0% | 3.5% | 0.0% | 1.4% | 0.0% |
| JPN-NK225 | 0.0% | 0.0% | 0.1% | 0.0% | 0.0% | 0.0% |
| UK-FT | 37.4% | 2.9% | 5.4% | 0.0% | 4.3% | 0.0% |
| Average | 15.6% | 1.0% | 3.0% | 0.0% | 1.9% | 0.0% |
The bias is due to the presence of lagged endogenous variables in the explanatory variables. In particular, the price of the asset appears implicitly both in the asset return (the dependent variable) and in both of the explanatory variables (dividend yield and lagged return). By definition, the dividend yield is the ratio of the dividend per share to the price of a security:
The return on a share is made up of a price appreciation and a dividend component:
Note that in a regression such as (1), both the explained variable (next period's return) and the explanatory variables (dividend yield and lagged return) include the asset price in their definition. This means that both explanatory variables include a lagged endogenous variable (price), and hence that the R2 is biased. The size of this bias is difficult to derive explicitly since the lagged endogenous variable appears in a non-linear form in dividend yield.
The same problem occurs in almost all TAA models, not just those which use lagged returns and dividend yields. For example, another key TAA explanatory variable is cumulative returns. We define the cumulative returns at time T over the horizon length L as:
This type of explanatory variable also includes a lagged endogenous variable, since lagged returns appear explicitly.
It is impossible to derive an explicit formula for the size of the bias in R2 except for very special cases. However, simulation techniques allow us to measure the bias quite accurately for particular models. Using a random number generator, we can replicate the logical relationship between returns, lagged returns, and dividend yields, while fixing the true relationship so that the model has zero explanatory power. This means that all the measured explanatory power is due to the bias.
The data generation process is set up in four simple phases:
1. We first generate a random return series by choosing returns from a normal distribution. Monthly returns are generated on the basis of a mean return of 6% annually and an annual standard deviation of 12%. Cumulative returns are calculated over the whole period as in Equation 4. For the first 11 observations of the series, the cumulative return is calculated with all the observable returns available.
2. We assume that dividends start at the value of 0.25 and grow at the constant rate of 0.3% monthly (3.6% annually).
3. From these two series we back out a price series. Remember that total returns include a price change and dividend element. We define Rt as the observed return at time t:
from which we get
The starting price is defined as the fair value in a constant growth DDM.
4. Finally, the dividend yield series is simply defined as Yldt = Divt /Pt.
We generate data for samples of 24, 48, 60, and 120 months. We use this data to run the two regressions:
and
We calculate R2 as well as the t-statistics for both independent variables and repeat the data generation and regressions 10,000 times for each sample size.
Returns often empirically exhibit more skewness and kurtosis than is implied by the normal distribution. We take this into account by repeating the exercise assuming returns do not follow a normal but a t(5)-distribution.3
Table 2 reports the average R2 obtained over 10,000 simulations. The first two columns contain the adjusted R2 of Regressions 6 and 7 in the case where returns are normally distributed. The average R2 is over 7% with two years of monthly data and goes down to 1.8% with ten years of monthly data. As expected, the size of the bias goes down as sample size increases, but is still nearly 2% with a comparatively long sample size of ten years. This is not negligible, especially when compared with the 2% average in-sample R2 of the 60-month regressions in Table 1.
| Return Distribution: |
Normal | t(5) | ||
|---|---|---|---|---|
| Independent Variables: |
Yld(t-1) R(t-1) |
Yld(t-1) CR(t-1) |
Yld(t-1) R(t-1) |
Yld(t-1) CR(t-1) |
| 24 months | 7.74% | 7.49% | 11.11% | 10.80% |
| 48 months | 4.19% | 4.05% | 5.25% | 5.09% |
| 60 months | 3.46% | 3.25% | 4.13% | 4.01% |
| 120 months | 1.77% | 1.77% | 1.94% | 1.97% |
Our simulations help explain the difference observed in Table 1 between in- and out-of-sample statistics. Biases alone would account for R2's of up to 8% in 24-month samples and up to 2% in ten-year samples. In other words, the in-sample R2's reported in Table 1 should not be interpreted as evidence of true forecasting value for this TAA model but may occur in the absence of any forecasting value due to the bias described above.
In Table 1 the in-sample R2 for the U.S. with a sample size of 24 months is 9.4%, while the out-of-sample R2 for this model is zero. Yet the first column of Table 2 indicates that if returns follow a normal distribution, the bias should account for R2's of "only" about 7.7 percent. Lifting the assumption of normality of returns helps explain this further discrepancy.
In the last two columns of Table 2 we drop the assumption of normality and instead assume that returns follow a t(5)-distribution. This distribution has greater kurtosis than the normalthat is to say, returns which are t(5)-distributed can be "exceptionally" large more often than in the normal distribution. The last two columns of Table 2 show higher R2's than under the assumption of normality, particularly where the sample size is small: With 24 observations, the average R2 attributed to biases goes from 7.7% to 11.1%. This may explain why the U.S. modelwith an R2 of 9.4% in-sampleshowed no out-of-sample forecasting power in Table 1. This would confirm that the t(5)-distribution is probably a better approximation of the returns' true distribution. The difference between the normal and t-distribution results is most important with a short sample period.
Finally, Table 3 looks at biases in the t-statistics of the coefficient estimates in Equations 6 and 7. The percentage can easily be interpreted by bearing in mind that, in the absence of bias and since our returns series are randomly generated, none of the coefficient estimates in either regression should be significantly different from zero. That is to say, the t-statistics reported in Table 3 should follow the t-distribution with 2.5% probability of a statistic either above 1.96 or below -1.96.
The absence of any true relationship in our generated model means that we should expect to see significant positive coefficients 2.5% of the time and significant negative coefficients 2.5% of the time (the significance test uses a 5% rejection region). T-statistics above 1.96, however, occur nearly 33% of the time as a result of the bias for the dividend yield coefficient. This proportion does not change significantly with either the sample size or the return distribution. Measuring the predictive power of the dividend yield using in-sample t-statistics is thus fraught with problems. Again, using out-of-sample statistics will eliminate the bias.
We might expect the bias on the cumulative returns t-statistics to increase with the horizon length L considered in the cumulative returns calculation (4).4 Indeed, the bias is not huge with one-month cumulative returns. But with a 12-month accumulation horizon it becomes sizeable: over 9% of t-statistics indicate a significantly positive lagged return sensitivity in the shortest sample period considered. This proportion decreases to about 4.5% with a ten-year sample size, still two percent above what would occur in the absence of bias.
The in-sample bias caused by the use of lagged returns can be kept under control by insuring that the accumulation horizon is not too long and that the in-sample period is sufficiently long. With dividend yields, however, only out-of-sample statistics can eliminate the bias in t-statistics revealed in Tables 3A and 3B.
| Returns are normally distributed | Returns are t(5)-distributed | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| t(Yld) | < | t(Yld) | > | t(R) | < | t(R) | > | ...... | t(Yld) | < | t(Yld) | > | t(R) | < | t(R) | > | |
| -1.96 | 1.96 | -1.96 | 1.96 | -1.96 | 1.96 | -1.96 | 1.96 | ||||||||||
| No. of months | |||||||||||||||||
| 24 | 0.05% | 30.79% | 1.76% | 4.60% | 0.08% | 30.31% | 1.37% | 4.33% | |||||||||
| 48 | 0.03% | 31.90% | 1.43% | 3.97% | 0.04% | 30.23% | 1.91% | 3.28% | |||||||||
| 60 | 0.03% | 32.16% | 2.01% | 4.00% | 0.03% | 30.67% | 1.74% | 3.18% | |||||||||
| 120 | 0.01% | 32.79% | 1.75% | 2.89% | 0.00% | 32.97% | 2.03% | 2.80% | |||||||||
| Returns are normally distributed | Returns are t(5)-distributed | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| t(Yld) | < | t(Yld) | > | t(R) | < | t(R) | > | ...... | t(Yld) | < | t(Yld) | > | t(R) | < | t(R) | > | |
| -1.96 | 1.96 | -1.96 | 1.96 | -1.96 | 1.96 | -1.96 | 1.96 | ||||||||||
| No. of months | |||||||||||||||||
| 24 | 1.23% | 29.01% | 7.14% | 9.62% | 1.32% | 28.36% | 7.06% | 9.24% | |||||||||
| 48 | 0.41% | 32.90% | 4.89% | 7.34% | 0.34% | 32.69% | 4.06% | 7.07% | |||||||||
| 60 | 0.37% | 33.55% | 3.97% | 6.46% | 0.25% | 32.19% | 4.18% | 6.49% | |||||||||
| 120 | 0.07% | 33.55% | 2.96% | 4.80% | 0.13% | 33.77% | 3.26% | 4.46% | |||||||||
Investment managers undertaking TAA model-building have the choice of a wide range of statistical software packages to aid their research efforts. The new Altis System has at least two advantages over standard statistical software: Altis links the estimation procedure to the database and portfolio optimization capabilities of the World Markets Model; and it is explicitly constructed to facilitate out-of-sample testing. This allows easy application of important reliability checks. In this article, we show how this facility of Altis allows the investment manager to avoid serious in-sample biases which might otherwise seriously degrade the reliability of TAA models.
* We would like to thank Ross Curds for helpful comments. (Back to text)
1 See "Global Asset Allocation: The BARRA Altis System and World Markets Model" by Nick Sudbury in this Newsletter. (Back to text)
2 See Nelson, Charles R. and Kim, Myung J., "Predictable Stock Returns: The Role of Small Sample Bias," Journal of Finance, Volume xlviii, No. 2, June 1993, pp. 641-662, and Goetzmann, William N. and Jorion, Philippe, "Testing the Predictive Power of Dividend Yields," Journal of Finance, Volume xlviii, No. 2, June 1993, pp. 663-679. (Back to text)
3 The parameter of the t-distribution is chosen to fit the kurtosis of a time series of the monthly total return on the S&P 500 Index in the U.S. over the period February 1972 to November 1996. (Back to text)
4 Increasing the length of the cumulative return period is similar to shrinking the sample size, and so it might increase the size of this small-sample bias. (Back to text)
[client support]
[portfolio management]
[investment data]
[trading services] [search] [site map] [contact us] [home]
Any questions or bug reports regarding this service should go to contactus@barra.com. |