Can Big Data Defeat Traditional Credit Rating?

This paper examines the impact of large-scale alternative data, or big data, on predicting consumer loan delinquency for a traditional lender. Based on a unique proprietary dataset containing 700 million individuals and 20,000 variables, we construct a big data credit score by applying machine learning techniques to deal with high dimensionality and massive missing values. We find that incorporating the big data credit score improves the lender’s accuracy in predicting a borrower’s delinquency likelihood by 22.6%. We identify two possible ways through which big data contributes: providing more information for those without public credit records and correcting financial misreporting.

[1]  Bertrand K. Hassani,et al.  Credit Risk Analysis Using Machine and Deep Learning Models , 2018 .

[2]  E. Vytlacil,et al.  Liar's Loan? Effects of Origination Channel and Information Falsification on Mortgage Delinquency , 2014, Review of Economics and Statistics.

[3]  Jonathan Levin,et al.  Economics in the age of big data , 2014, Science.

[4]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[5]  Bryan T. Kelly,et al.  Empirical Asset Pricing Via Machine Learning , 2018, The Review of Financial Studies.

[6]  Mark A. Chen,et al.  How Valuable Is FinTech Innovation? , 2018 .

[7]  Tullio Jappelli,et al.  Who is Credit Constrained in the U. S. Economy , 1990 .

[8]  Julapa Jagtiani,et al.  The Roles of Alternative Data and Machine Learning in Fintech Lending: Evidence from the Lendingclub Consumer Platform , 2018, Financial Management.

[9]  Mark J. Garmaise,et al.  Borrower Misreporting and Loan Performance , 2013 .

[10]  M Daszykowski,et al.  Dealing with missing values and outliers in principal component analysis. , 2007, Talanta.

[11]  Tobias Berg,et al.  On the Rise of FinTechs – Credit Scoring Using Digital Footprints , 2018, The Review of Financial Studies.

[12]  Lars Norden,et al.  Credit Line Usage, Checking Account Activity, and Default Risk of Bank Borrowers , 2009 .

[13]  Jennifer Jie Xu,et al.  Knowledge Discovery and Data Mining , 2014, Computing Handbook, 3rd ed..

[14]  Juha Karhunen,et al.  Principal Component Analysis for Large Scale Problems with Lots of Missing Values , 2007, ECML.

[15]  Robert Van Order,et al.  Income, Location and Default: Some Implications for Community Lending , 2000 .

[16]  David V. Pritchett Econometric policy evaluation: A critique , 1976 .

[17]  John M. Griffin,et al.  Who Facilitated Misreporting in Securitized Loans , 2016 .

[18]  Rajkamal Iyer,et al.  Screening Peers Softly: Inferring the Quality of Small Borrowers , 2009, Manag. Sci..

[19]  J. Stiglitz,et al.  Credit Rationing in Markets with Imperfect Information , 1981 .

[20]  Christina Zhu,et al.  Big Data as a Governance Mechanism , 2018, The Review of Financial Studies.

[21]  Andrea De Mauro,et al.  A formal definition of Big Data based on its essential features , 2016 .

[22]  Robert P. Bartlett,et al.  Consumer-Lending Discrimination in the Fintech Era , 2017, Journal of Financial Economics.

[23]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[24]  Jun Yang,et al.  Carrot or stick ? Evidence from a pair of natural field experiments testing lender information sharing hypotheses , 2019 .