Empirical Asset Pricing Via Machine Learning

We perform a comparative analysis of machine learning methods for the canonical problem of empirical asset pricing: measuring asset risk premiums. We demonstrate large economic gains to investors using machine learning forecasts, in some cases doubling the performance of leading regression-based strategies from the literature. We identify the best-performing methods (trees and neural networks) and trace their predictive gains to allowing nonlinear predictor interactions missed by other methods. All methods agree on the same set of dominant predictive signals, a set that includes variations on momentum, liquidity, and volatility. Authors have furnished an Internet Appendix, which is available on the Oxford University Press Web site next to the link to the final published paper online.

[1]  E. Alzate Modelos de mezclas Bernoulli con regresión logística: una aplicación en la valoración de carteras de crédito , 2020 .

[2]  Cardona Alzate,et al.  Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas , 2020 .

[3]  Serhiy Kozak,et al.  Shrinking the Cross Section , 2017, Journal of Financial Economics.

[4]  Dacheng Xiu,et al.  Taming the Factor Zoo: A Test of New Factors , 2017, The Journal of Finance.

[5]  Joachim Freyberger,et al.  Dissecting Characteristics Nonparametrically , 2017, The Review of Financial Studies.

[6]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[7]  Bryan T. Kelly,et al.  Autoencoder Asset Pricing Models , 2019, Journal of Econometrics.

[8]  Serhiy Kozak,et al.  Kernel Trick for the Cross-Section , 2019, SSRN Electronic Journal.

[9]  B. Kelly,et al.  Characteristics Are Covariances: A Unified Model of Risk and Return , 2018, Journal of Financial Economics.

[10]  Francis Bach,et al.  On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.

[11]  David Rolnick,et al.  The power of deeper networks for expressing natural functions , 2017, ICLR.

[12]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[13]  Bryan T. Kelly,et al.  Some Characteristics Are Risk Exposures, and the Rest Are Irrelevant , 2017 .

[14]  Dacheng Xiu,et al.  Inference on Risk Premia in the Presence of Omitted Factors , 2017 .

[15]  Jianqing Fan,et al.  Asymptotics of empirical eigenstructure for high dimensional spiked covariance. , 2017, Annals of statistics.

[16]  Jianqing Fan,et al.  Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions , 2017, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[17]  Dacheng Xiu,et al.  Taming the Factor Zoo∗ , 2017 .

[18]  John R. M. Hand,et al.  The Characteristics that Provide Independent Information about Average U.S. Monthly Stock Returns , 2016 .

[19]  Max Tegmark,et al.  Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[20]  Justin A. Sirignano,et al.  Deep Learning for Mortgage Risk , 2016, Journal of Financial Econometrics.

[21]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[22]  Tom Zimmermann,et al.  Tree-Based Conditional Portfolio Sorts: The Relation between Past and Future Stock Returns , 2016 .

[23]  Nicholas G. Polson,et al.  Deep Learning in Finance , 2016, ArXiv.

[24]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Campbell R. Harvey,et al.  Editor's Choice … and the Cross-Section of Expected Returns , 2016 .

[27]  Sanmay Das,et al.  Risk and Risk Management in the Credit Card Industry , 2015 .

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  James G. Scott,et al.  Proximal Algorithms in Statistics and Machine Learning , 2015, ArXiv.

[30]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[31]  Yan Liu,et al.  …and the Cross-Section of Expected Returns , 2015 .

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[34]  Campbell R. Harvey,et al.  . . . And the Cross-Section of Expected Returns , 2014 .

[35]  E. Fama,et al.  A Five-Factor Asset Pricing Model , 2014 .

[36]  J. Lewellen The Cross Section of Expected Stock Returns , 2014 .

[37]  Bryan T. Kelly,et al.  The Three-Pass Regression Filter: A New Approach to Forecasting Using Many Predictors , 2014 .

[38]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[39]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[40]  Bryan T. Kelly,et al.  Market Expectations in the Cross-Section of Present Values: Market Expectations in the Cross-Section of Present Values , 2013 .

[41]  J. Bai,et al.  Principal components estimation and identification of static factors , 2013 .

[42]  Martin Jaggi,et al.  An Equivalence between the Lasso and Support Vector Machines , 2013, ArXiv.

[43]  Serena Ng,et al.  Variable Selection in Predictive Regressions , 2013 .

[44]  Frank J. Fabozzi,et al.  Forecasting Stock Returns , 2012 .

[45]  Francis X. Diebold,et al.  Comparing Predictive Accuracy, Twenty Years Later: A Personal Perspective on the Use and Abuse of Diebold–Mariano Tests , 2012 .

[46]  Seth Pruitt,et al.  Market Expectations in the Cross Section of Present Values , 2012 .

[47]  John R. M. Hand,et al.  The supraview of return predictive signals , 2012 .

[48]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[49]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[50]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[51]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[52]  Stijn Van Nieuwerburgh,et al.  Predictability of Returns and Cash Flows , 2010 .

[53]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[54]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[55]  A. Lo,et al.  Consumer Credit Risk Models Via Machine-Learning Algorithms , 2010 .

[56]  Guofu Zhou,et al.  International Stock Return Predictability: What is the Role of the United States? , 2010 .

[57]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[58]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[59]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[60]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[61]  Bruce D. Phelps A Comprehensive Look at the Empirical Performance of Equity Premium Prediction , 2009 .

[62]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[63]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[65]  S. B. Thompson,et al.  Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average? , 2008 .

[66]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[67]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[68]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[69]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[70]  E. Fama,et al.  Dissecting Anomalies , 2007 .

[71]  Berkman Sahiner,et al.  Dual system approach to computer-aided detection of breast masses on mammograms. , 2006, Medical physics.

[72]  S. B. Thompson,et al.  Cross-sectional forecasts of the equity premium , 2006 .

[73]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[74]  C. Granger,et al.  Handbook of Economic Forecasting , 2006 .

[75]  Yannis Dimopoulos,et al.  Use of some sensitivity criteria for choosing networks with good generalization ability , 1995, Neural Processing Letters.

[76]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[77]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[78]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[79]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[80]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[81]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[82]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[83]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[84]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[85]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[86]  S. Morgenthaler Robustness in Statistics , 2001 .

[87]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[88]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[89]  C. Tan,et al.  Option price forecasting using neural networks , 2000 .

[90]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[91]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[92]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[93]  Sydney C. Ludvigson,et al.  Consumption, Aggregate Wealth and Expected Stock Returns , 1999 .

[94]  Z. Bai METHODOLOGIES IN SPECTRAL ANALYSIS OF LARGE DIMENSIONAL RANDOM MATRICES , A REVIEW , 1999 .

[95]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[96]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[97]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[98]  Timothy Masters,et al.  Practical neural network recipes in C , 1993 .

[99]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[100]  E. Fama,et al.  Common risk factors in the returns on stocks and bonds , 1993 .

[101]  Timothy Masters,et al.  Multilayer Feedforward Networks , 1993 .

[102]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[103]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[104]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[105]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[106]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[107]  A. Lo,et al.  Data-Snooping Biases in Tests of Financial Asset Pricing Models , 1989 .

[108]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[109]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[110]  H. White Using Least Squares to Approximate Unknown Regression Functions , 1980 .

[111]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[112]  G. Box Robustness in the Strategy of Scientific Model Building. , 1979 .

[113]  Barr Rosenberg,et al.  Extra-Market Components of Covariance in Security Returns , 1974, Journal of Financial and Quantitative Analysis.

[114]  J. Tukey A survey of sampling from contaminated distributions , 1960 .

[115]  G. Box NON-NORMALITY AND TESTS ON VARIANCES , 1953 .

[116]  K. West,et al.  FORECAST EVALUATION , 2022 .