Predicting Mortality from Credit Reports

Data on hundreds of variables related to individual consumer finance behavior (such as credit card and loan activity) is routinely collected in many countries and plays an important role in lending decisions. We postulate that the detailed nature of this data may be used to predict outcomes in seemingly unrelated domains such as individual health. We build a series of machine learning models to demonstrate that credit report data can be used to predict individual mortality. Variable groups related to credit cards and various loans, mostly unsecured loans, are shown to carry significant predictive power. Lags of these variables are also significant thus indicating that dynamics also matters. Improved mortality predictions based on consumer finance data can have important economic implications in insurance markets but may also raise privacy concerns.

[1]  C. Ruhm,et al.  Are Recessions Good for Your Health? , 1996 .

[2]  C. Ruhm A healthy economy can break your heart , 2006, Demography.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[5]  S. Solomon,et al.  Longitudinal Associations Between Income Changes and Incident Cardiovascular Disease: The Atherosclerosis Risk in Communities Study. , 2019, JAMA cardiology.

[6]  Gerard J. van den Berg,et al.  Economic Conditions Early in Life and Individual Mortality. , 2006, The American economic review.

[7]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[10]  M. Harding,et al.  Quantifying the impact of economic crises on infant mortality in advanced economies , 2011 .

[11]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[12]  C. Ruhm Good times make you sick. , 2003, Journal of health economics.

[13]  J. Stiglitz,et al.  Credit Rationing in Markets with Imperfect Information , 1981 .

[14]  Douglas L. Miller,et al.  The Best of Times, the Worst of Times: Understanding Pro-Cyclical Mortality , 2011, American economic journal. Economic policy.

[15]  J. Friedman Stochastic gradient boosting , 2002 .

[16]  Jörg Tiedemann,et al.  Detecting hospital-acquired infections: A document classification approach using support vector machines and gradient tree boosting , 2016, Health Informatics J..

[17]  Monika K. Hellwig Best of times, worst of times. , 2014, Nature reviews. Microbiology.

[18]  Tom Zimmermann,et al.  Bottom-Up Leading Macroeconomic Indicators: An Application to Non-Financial Corporate Defaults Using Machine Learning , 2019, Finance and Economics Discussion Series.

[19]  Douglas L. Miller,et al.  Who Suffers During Recessions? , 2012 .

[20]  Wilbert van der Klaauw,et al.  An Introduction to the FRBNY Consumer Credit Panel , 2010 .

[21]  Zefeng Li,et al.  Machine Learning Seismic Wave Discrimination: Application to Earthquake Early Warning , 2018, Geophysical Research Letters.

[22]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[23]  S. Solomon,et al.  Longitudinal Associations Between Income Changes and Incident Cardiovascular Disease: The Atherosclerosis Risk in Communities Study. , 2019, JAMA cardiology.

[24]  A. Lo,et al.  Consumer Credit Risk Models Via Machine-Learning Algorithms , 2010 .

[25]  Jessica Granderson,et al.  Gradient boosting machine for modeling the energy consumption of commercial buildings , 2018 .

[26]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[27]  C. Ruhm Healthy Living in Hard Times , 2003, Journal of health economics.

[28]  Stefania Albanesi,et al.  Predicting Consumer Default: A Deep Learning Approach , 2019, SSRN Electronic Journal.