Forecasting retained earnings of privately held companies with PCA and L1 regression

We use proprietary data collected by SVB Analytics, an affiliate of Silicon Valley Bank, to forecast the retained earnings of privately held companies. Combining methods of principal component analysis PCA and L1/quantile regression, we build multivariate linear models that feature excellent in-sample fit and strong out-of-sample predictive accuracy. The combined PCA and L1 technique effectively deals with multicollinearity and non-normality of the data, and also performs favorably when compared against a variety of other models. Additionally, we propose a variable ranking procedure that explains which variables from the current quarter are most predictive of the next quarter's retained earnings. We fit models to the top five variables identified by the ranking procedure and thereby, discover interpretable models with excellent out-of-sample performance. Copyright © 2013 John Wiley & Sons, Ltd.

[1]  João Fernandes,et al.  Corporate Credit Risk Modeling: Quantitative Rating System and Probability of Default Estimation , 2005 .

[2]  J. Fox,et al.  Applied Regression Analysis and Generalized Linear Models , 2008 .

[3]  R. Koenker Quantile Regression: Name Index , 2005 .

[4]  Michael Minnis,et al.  The Value of Financial Statement Verification in Debt Financing: Evidence from Private U.S. Firms , 2011 .

[5]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[6]  D. Kundu Discriminating Between Normal and Laplace Distributions , 2005 .

[7]  Harish S. Bhat,et al.  Predicting Private Company Exits Using Qualitative Data , 2011, PAKDD.

[8]  Daniel Berg Bankruptcy Prediction by Generalized Additive Models , 2006 .

[9]  Philip D. Bunn,et al.  Company Accounts-Based Modelling of Business Failures and the Implications for Financial Stability , 2003 .

[10]  Aljosa Valentincic,et al.  Forecasting the Liquidity of Very Small Private Companies , 2003 .

[11]  F. Nielsen,et al.  Globalization and the Great U-Turn: Income Inequality Trends in 16 OECD Countries1 , 2002, American Journal of Sociology.

[12]  R. Koenker,et al.  Regression Quantiles , 2007 .

[13]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[14]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[15]  R. Rocci,et al.  Assessing the default risk by means of a discrete-time survival analysis approach , 2008 .

[16]  John R. M. Hand The Value Relevance of Financial Statements in the Venture Capital Market , 2005 .

[17]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[18]  R. Bowman,et al.  Using Comparable Companies to Estimate the Betas of Private Companies , 2007 .

[19]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[20]  W. Steiger,et al.  Least Absolute Deviations Curve-Fitting , 1980 .

[21]  I. Barrodale,et al.  An Improved Algorithm for Discrete $l_1 $ Linear Approximation , 1973 .

[22]  Beni Lauterbach,et al.  Pricing Warrants: An Empirical Study of the Black-Scholes Model and Its Alternatives , 1990 .

[23]  D. Joanes,et al.  Comparing measures of sample skewness and kurtosis , 1998 .

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[26]  Gonzalo R. Arce,et al.  A Maximum Likelihood Approach to Least Absolute Deviation Regression , 2004, EURASIP J. Adv. Signal Process..