Uncovering Sparsity and Heterogeneity in Firm-Level Return Predictability Using Machine Learning

Abstract We develop an approach that combines the estimation of monthly firm-level expected returns with an assignment of firms to (possibly) latent groups, both based on observable characteristics, using machine learning principles with linear models. The best-performing methods are flexible two-stage sparse models that capture group-membership predictive relationships. Portfolios formed to exploit such group-varying predictions based on a parsimonious set of characteristics deliver economically meaningful returns with low turnover. We propose statistical tests based on nonparametric bootstrapping for our results, and detail how different characteristics may matter for different groups of firms, making comparisons to the existing literature.

[1]  Guofu Zhou,et al.  Expected Stock Returns and Firm Characteristics: E-LASSO, Assessment, and Implications , 2021 .

[2]  Stefan Nagel,et al.  Machine Learning in Asset Pricing , 2021 .

[3]  Kent D. Daniel,et al.  The Cross-Section of Risk and Returns , 2020 .

[4]  G. Karolyi,et al.  New Methods for the Cross-Section of Returns , 2020 .

[5]  Maureen O'Hara,et al.  Innovation and Informed Trading: Evidence from Industry ETFs , 2020, The Review of Financial Studies.

[6]  Rémi Hess Journal , 2019, Vocabulaire des histoires de vie et de la recherche biographique.

[7]  Bryan T. Kelly,et al.  Autoencoder Asset Pricing Models , 2019, Journal of Econometrics.

[8]  Guofu Zhou,et al.  Industry Return Predictability: A Machine Learning Approach , 2019, The Journal of Financial Data Science.

[9]  Raman Uppal,et al.  A Transaction-Cost Perspective on the Multitude of Firm Characteristics , 2019, The Review of Financial Studies.

[10]  C. Carvalho,et al.  Monotonic Effects of Characteristics on Returns , 2018, The Annals of Applied Statistics.

[11]  Bryan T. Kelly,et al.  Empirical Asset Pricing Via Machine Learning , 2018, The Review of Financial Studies.

[12]  Francis X. Diebold,et al.  Machine Learning for Regularized Survey Forecast Combination: Partially-Egalitarian Lasso and its Derivatives , 2018, International Journal of Forecasting.

[13]  Leland E. Farmer,et al.  Pockets of Predictability , 2018, The Journal of Finance.

[14]  X. Gabaix Behavioral Inattention , 2017 .

[15]  Paul Goldsmith-Pinkham,et al.  Predictably Unequal? The Effects of Machine Learning on Credit Markets , 2017, The Journal of Finance.

[16]  Domenico Giannone,et al.  Economic Predictions with Big Data: The Illusion of Sparsity , 2017, Econometrica.

[17]  Dacheng Xiu,et al.  Taming the Factor Zoo: A Test of New Factors , 2017, The Journal of Finance.

[18]  Joachim Freyberger,et al.  Dissecting Characteristics Nonparametrically , 2017, The Review of Financial Studies.

[19]  Christian Hansen,et al.  Double/Debiased/Neyman Machine Learning of Treatment Effects , 2017, 1701.08687.

[20]  Xavier Gabaix,et al.  A Behavioral New Keynesian Model , 2016, American Economic Review.

[21]  John R. M. Hand,et al.  The Characteristics that Provide Independent Information about Average U.S. Monthly Stock Returns , 2016 .

[22]  Motohiro Yogo,et al.  A Demand System Approach to Asset Pricing , 2015, Journal of Political Economy.

[23]  Julien Sauvagnat,et al.  Input Specificity and the Propagation of Idiosyncratic Shocks in Production Networks , 2015 .

[24]  Tomohiro Ando,et al.  Clustering Huge Number of Financial Time Series: A Panel Data Approach With High-Dimensional Predictors and Factor Structures , 2015 .

[25]  Elena Manresa,et al.  Grouped Patterns of Heterogeneity in Panel Data , 2015 .

[26]  Robert Novy-Marx,et al.  A Taxonomy of Anomalies and Their Trading Costs , 2014 .

[27]  E. Fama,et al.  A Five-Factor Asset Pricing Model , 2014 .

[28]  S. Mullainathan,et al.  Learning Through Noticing: Theory and Evidence from a Field Experiment , 2014 .

[29]  R. Tibshirani,et al.  Exact Post-Selection Inference for Sequential Regression Procedures , 2014, 1401.3889.

[30]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[31]  Raman Uppal,et al.  Stock Return Serial Dependence and Out-of-Sample Portfolio Performance , 2013 .

[32]  Frank J. Fabozzi,et al.  Forecasting Stock Returns , 2012 .

[33]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[34]  Marco Rossi,et al.  The Role of Heterogeneity in Asset Pricing: The Effect of a Clustering Approach , 2011 .

[35]  John H. Cochrane,et al.  Presidential Address: Discount Rates , 2011 .

[36]  Xavier Gabaix,et al.  A Sparsity-Based Model of Bounded Rationality , 2011 .

[37]  Gerard Hoberg,et al.  Text-Based Network Industries and Endogenous Product Differentiation , 2010, Journal of Political Economy.

[38]  Lior Menzly,et al.  Market Segmentation and Cross-Predictability of Returns , 2009 .

[39]  Victor DeMiguel,et al.  Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy? , 2009 .

[40]  Guofu Zhou,et al.  Out-of-Sample Equity Premium Prediction: Combination Forecasts and Links to the Real Economy , 2009 .

[41]  Bruce D. Phelps A Comprehensive Look at the Empirical Performance of Equity Premium Prediction , 2009 .

[42]  S. B. Thompson,et al.  Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average? , 2008 .

[43]  G. Kapetanios A Bootstrap Procedure for Panel Data Sets with Many Cross-Sectional Units , 2008 .

[44]  Stijn Van Nieuwerburgh,et al.  The Wealth-Consumption Ratio , 2008 .

[45]  Michael R Chernick,et al.  Bootstrap Methods: A Guide for Practitioners and Researchers , 2007 .

[46]  Daniel Dorn,et al.  Preferred Risk Habitat of Individual Investors , 2007 .

[47]  Raj Chetty,et al.  Salience and Taxation: Theory and Evidence , 2007 .

[48]  Kewei Hou,et al.  Industry Information Diffusion and the Lead-Lag Effect in Stock Returns , 2007 .

[49]  Andrea Frazzini,et al.  Economic Links and Predictable Returns , 2007 .

[50]  Gregory W. Brown,et al.  Firm-Specific Risk and Equity Market Development , 2006 .

[51]  H. Zou,et al.  Addendum: Regularization and variable selection via the elastic net , 2005 .

[52]  A. Cameron,et al.  Microeconometrics: Methods and Applications , 2005 .

[53]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[54]  Yi Zhang,et al.  Information Uncertainty and Expected Returns , 2004 .

[55]  P. Veronesi,et al.  Understanding Predictability , 2004, Journal of Political Economy.

[56]  C. Sims Implications of rational inattention , 2003 .

[57]  Clifford S. Asness,et al.  Predicting Stock Returns Using Industry-Relative Firm Characteristics , 2000 .

[58]  J. Lewellen,et al.  The time-series relations among expected return, risk, and book-to-market ☆ , 1999 .

[59]  Gautam Kaul,et al.  An Anatomy of Trading Strategies , 1998 .

[60]  E. Fama Market Efficiency, Long-Term Returns, and Behavioral Finance , 1997 .

[61]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[62]  S. Titman,et al.  The Persistence of Mutual Fund Performance , 1992 .

[63]  Q. Vuong,et al.  Selecting the best linear regression model: A classical approach☆ , 1987 .

[64]  S. Ross The arbitrage theory of capital asset pricing , 1976 .

[65]  R. C. Merton,et al.  AN INTERTEMPORAL CAPITAL ASSET PRICING MODEL , 1973 .

[66]  A. Belloni,et al.  Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain , 2012 .

[67]  Wei Xiongb,et al.  Investor attention , overconfidence and category learning , 2006 .

[68]  Robert Novy-Marx Testing strategies based on multiple signals , 2022 .