Empirical Asset Pricing Via Machine Learning

We perform a comparative analysis of machine learning methods for the canonical problem of empirical asset pricing: measuring asset risk premia. We demonstrate large economic gains to investors using machine learning forecasts, in some cases doubling the performance of leading regression-based strategies from the literature. We identify the best performing methods (trees and neural networks) and trace their predictive gains to allowance of nonlinear predictor interactions that are missed by other methods. All methods agree on the same set of dominant predictive signals which includes variations on momentum, liquidity, and volatility. Improved risk premium measurement through machine learning simplifies the investigation into economic mechanisms of asset pricing and highlights the value of machine learning in financial innovation.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[3]  S. B. Thompson,et al.  Cross-sectional forecasts of the equity premium , 2006 .

[4]  James G. Scott,et al.  Proximal Algorithms in Statistics and Machine Learning , 2015, ArXiv.

[5]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[6]  Tom Zimmermann,et al.  Tree-Based Conditional Portfolio Sorts: The Relation between Past and Future Stock Returns , 2016 .

[7]  E. Fama,et al.  Dissecting Anomalies , 2007 .

[8]  John R. M. Hand,et al.  The Characteristics that Provide Independent Information about Average U.S. Monthly Stock Returns , 2016 .

[9]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[10]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[11]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[12]  E. Fama,et al.  Common risk factors in the returns on stocks and bonds , 1993 .

[13]  Campbell R. Harvey,et al.  Editor's Choice … and the Cross-Section of Expected Returns , 2016 .

[14]  Bryan T. Kelly,et al.  The Three-Pass Regression Filter: A New Approach to Forecasting Using Many Predictors , 2014 .

[15]  Serhiy Kozak,et al.  Shrinking the Cross Section , 2017, Journal of Financial Economics.

[16]  Jianqing Fan,et al.  Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions , 2017, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[17]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[18]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[19]  Timothy Masters,et al.  Multilayer Feedforward Networks , 1993 .

[20]  Nicholas G. Polson,et al.  Deep Learning in Finance , 2016, ArXiv.

[21]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[22]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Joachim Freyberger,et al.  Dissecting Characteristics Nonparametrically , 2017, The Review of Financial Studies.

[24]  Serena Ng,et al.  Variable Selection in Predictive Regressions , 2013 .

[25]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[26]  Dacheng Xiu,et al.  Taming the Factor Zoo∗ , 2017 .

[27]  Jianqing Fan,et al.  Asymptotics of empirical eigenstructure for high dimensional spiked covariance. , 2017, Annals of statistics.

[28]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  David Rolnick,et al.  The power of deeper networks for expressing natural functions , 2017, ICLR.

[30]  C. Granger,et al.  Handbook of Economic Forecasting , 2006 .

[31]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[32]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[33]  Francis X. Diebold,et al.  Comparing Predictive Accuracy, Twenty Years Later: A Personal Perspective on the Use and Abuse of Diebold–Mariano Tests , 2012 .

[34]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[35]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[36]  J. Lewellen The Cross Section of Expected Stock Returns , 2014 .

[37]  A. Lo,et al.  Data-Snooping Biases in Tests of Financial Asset Pricing Models , 1989 .

[38]  Justin A. Sirignano,et al.  Deep Learning for Mortgage Risk , 2016, Journal of Financial Econometrics.

[39]  Guofu Zhou,et al.  International Stock Return Predictability: What is the Role of the United States? , 2010 .

[40]  Francis Bach,et al.  On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.

[41]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[42]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[43]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[44]  Yannis Dimopoulos,et al.  Use of some sensitivity criteria for choosing networks with good generalization ability , 1995, Neural Processing Letters.

[45]  Z. Bai METHODOLOGIES IN SPECTRAL ANALYSIS OF LARGE DIMENSIONAL RANDOM MATRICES , A REVIEW , 1999 .

[46]  Timothy Masters,et al.  Practical neural network recipes in C , 1993 .

[47]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[48]  Bryan T. Kelly,et al.  Autoencoder Asset Pricing Models , 2019, Journal of Econometrics.

[49]  J. Bai,et al.  Principal components estimation and identification of static factors , 2013 .

[50]  Stijn Van Nieuwerburgh,et al.  Predictability of Returns and Cash Flows , 2010 .

[51]  B. Kelly,et al.  Characteristics Are Covariances: A Unified Model of Risk and Return , 2018, Journal of Financial Economics.

[52]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[53]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[54]  Serhiy Kozak,et al.  Kernel Trick for the Cross-Section , 2019, SSRN Electronic Journal.

[55]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[56]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[57]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[58]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[59]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[60]  Guofu Zhou,et al.  International Stock Return Predictability: What is the Role of the United States? , 2010 .

[61]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[62]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[63]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[64]  I. Welch,et al.  A Comprehensive Look at the Empirical Performance of Equity Premium Prediction II , 2004, SSRN Electronic Journal.

[65]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[66]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[67]  Sanmay Das,et al.  Risk and Risk Management in the Credit Card Industry , 2015 .

[68]  Frank J. Fabozzi,et al.  Forecasting Stock Returns , 2012 .

[69]  John R. M. Hand,et al.  The supraview of return predictive signals , 2013 .

[70]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[71]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[72]  H. White Using Least Squares to Approximate Unknown Regression Functions , 1980 .

[73]  Yan Liu,et al.  Lucky factors , 2021, Journal of Financial Economics.

[74]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[75]  G. Box NON-NORMALITY AND TESTS ON VARIANCES , 1953 .

[76]  S. B. Thompson,et al.  Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average? , 2008 .

[77]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[78]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[79]  Dacheng Xiu,et al.  Asset Pricing with Omitted Factors , 2019, Journal of Political Economy.

[80]  K. West,et al.  FORECAST EVALUATION , 2022 .

[81]  Martin Jaggi,et al.  An Equivalence between the Lasso and Support Vector Machines , 2013, ArXiv.

[82]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[83]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[84]  A. Lo,et al.  Consumer Credit Risk Models Via Machine-Learning Algorithms , 2010 .

[85]  Dacheng Xiu,et al.  Taming the Factor Zoo: A Test of New Factors , 2017, The Journal of Finance.

[86]  Sydney C. Ludvigson,et al.  Consumption, Aggregate Wealth and Expected Stock Returns , 1999 .

[87]  E. Fama,et al.  A Five-Factor Asset Pricing Model , 2014 .

[88]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[90]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[91]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[92]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[93]  S. Morgenthaler Robustness in Statistics , 2001 .

[94]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[95]  Yan Liu,et al.  …and the Cross-Section of Expected Returns , 2015 .

[96]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[97]  Bryan T. Kelly,et al.  Market Expectations in the Cross-Section of Present Values: Market Expectations in the Cross-Section of Present Values , 2013 .

[98]  Barr Rosenberg,et al.  Extra-Market Components of Covariance in Security Returns , 1974, Journal of Financial and Quantitative Analysis.

[99]  G. Box Robustness in the Strategy of Scientific Model Building. , 1979 .

[100]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[101]  Bryan T. Kelly,et al.  Some Characteristics Are Risk Exposures, and the Rest Are Irrelevant , 2017 .

[102]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[103]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[104]  C. Tan,et al.  Option price forecasting using neural networks , 2000 .

[105]  Campbell R. Harvey,et al.  . . . And the Cross-Section of Expected Returns , 2014 .

[106]  David E. Rapach,et al.  International Stock Return Predictability: What Is the Role of the United States?: International Stock Return Predictability , 2013 .

[107]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[108]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[109]  Dacheng Xiu,et al.  Inference on Risk Premia in the Presence of Omitted Factors , 2017 .

[110]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[111]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[112]  Seth Pruitt,et al.  Market Expectations in the Cross Section of Present Values , 2012 .

[113]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[114]  Max Tegmark,et al.  Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[115]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..