Regression and Machine Learning Methods to Predict Discrete Outcomes in Accounting Research

Predictive modeling focuses on iteratively trying various combinations and transformations of a set of variables to generate a decision rule that predicts outcomes for new observations. Although accounting researchers have demonstrated a keen interest in predictive modeling, we identify a lack of accessible and applied guidance on this topic for accounting settings. This issue has become more salient with the increasing availability of machine learning models that use unfamiliar terminology, that can be estimated using several "competing" algorithms, and that produce different outputs than other models used for causal inference. To overcome this gap, we provide an overview of how to predict discrete outcomes with logistic regression and two machine learning models used in recent studies: support vector machines and gradient boosting. We also include guidance and a comprehensive example - predicting investigations by the U.S. Securities and Exchange Commission - that illustrates the elements of the prediction process, highlighting the importance of "out-of-sample" accuracy and unique aspects in the presentation of a prediction model's results.

[1]  B. Li,et al.  Detecting Accounting Fraud in Publicly Traded U.S. Firms Using a Machine Learning Approach , 2020 .

[2]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[3]  Alastair Lawrence,et al.  Can Big 4 versus Non-Big 4 Differences in Audit-Quality Proxies Be Attributed to Client Characteristics? , 2011 .

[4]  Matthias Schonlau,et al.  Boosted Regression (Boosting): An Introductory Tutorial and a Stata Plugin , 2005 .

[5]  David F. Larcker,et al.  Detecting Deceptive Discussions in Conference Calls , 2012 .

[6]  Praveen Pathak,et al.  Detecting Management Fraud in Public Companies , 2010, Manag. Sci..

[7]  Lynnette D. Purda Accounting Variables , Deception , and a Bag of Words : Assessing the Tools of Fraud Detection * , 2014 .

[8]  Miguel Minutti-Meza,et al.  Do going concern opinions provide incremental information to predict corporate defaults? , 2019, Review of Accounting Studies.

[9]  Cristian S. Calude,et al.  The Deluge of Spurious Correlations in Big Data , 2016, Foundations of Science.

[10]  Stewart Jones Corporate bankruptcy prediction: a high dimensional analysis , 2017, Review of Accounting Studies.

[11]  B. Lev,et al.  Fundamental Analysis of Detailed Financial Data: A Machine Learning Approach , 2021, SSRN Electronic Journal.

[12]  Johan L. Perols Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms , 2011 .

[13]  D. BeneishMessod,et al.  The Detection of Earnings Manipulation , 1999 .

[14]  Charles E McCulloch,et al.  Relaxing the rule of ten events per variable in logistic and Cox regression. , 2007, American journal of epidemiology.

[15]  Nicole S. Wright,et al.  What's in a Name? Initial Evidence of U.S. Audit Partner Identification Using Difference-in-Differences Analyses , 2018, The Accounting Review.

[16]  Brandon M. Greenwell pdp: An R Package for Constructing Partial Dependence Plots , 2017, R J..

[17]  Machine learning improves accounting estimates: evidence from insurance payments , 2020 .

[18]  Houtao Deng,et al.  Interpreting tree ensembles with inTrees , 2018, International Journal of Data Science and Analytics.

[19]  S. Penman,et al.  FINANCIAL STATEMENT ANALYSIS AND THE PREDICTION OF STOCK RETURNS , 1989 .

[20]  Terrence Blackburne,et al.  Undisclosed SEC Investigations , 2021, Manag. Sci..

[21]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[22]  Gary Longton,et al.  Accommodating Covariates in Receiver Operating Characteristic Analysis , 2009 .

[23]  Bijan Raahemi,et al.  Detecting financial restatements using data mining techniques , 2017, Expert Syst. Appl..

[24]  Soohyun Cho,et al.  Learning from Machine Learning in Accounting and Assurance , 2020 .

[25]  Douglas J. Skinner,et al.  Measuring Securities Litigation Risk , 2012 .

[26]  Prediction versus Inducement and the Informational Efficiency of Going Concern Opinions , 2016 .

[27]  Tyler Shumway Forecasting Bankruptcy More Accurately: A Simple Hazard Model , 1999 .

[28]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[29]  W. Beaver,et al.  Have Financial Statements Become Less Informative? Evidence from the Ability of Financial Ratios to Predict Bankruptcy , 2004 .

[30]  Patricia M. Dechow,et al.  Predicting Material Accounting Misstatements*: Predicting Material Accounting Misstatements , 2011 .

[31]  Gene Moo Lee,et al.  Predicting Litigation Risk via Machine Learning , 2020 .

[32]  Jonathan A Cook,et al.  When to consult precision-recall curves , 2020 .

[33]  William J. Mayew,et al.  Analyzing Speech to Detect Financial Misreporting , 2011 .

[34]  Joseph F. Brazel,et al.  Using Nonfinancial Measures to Assess Fraud Risk , 2009 .

[35]  Stanley Lemeshow,et al.  Applied Logistic Regression, Second Edition , 1989 .

[36]  M. D. Beneish,et al.  Detecting GAAP violation: implications for assessing earnings management among firms with extreme financial performance , 1997 .

[37]  Jesse Davis,et al.  Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation , 2012, ICML.

[38]  Carolyn B. Levine,et al.  Using MD&A to Improve Earnings Forecasts , 2013 .

[39]  Nerissa C. Brown,et al.  What Are You Saying? Using topic to Detect Financial Misreporting , 2020 .

[40]  Eric Floyd,et al.  Using machine learning to detect misstatements , 2019, Review of Accounting Studies.

[41]  J. Campbell,et al.  In Search of Distress Risk , 2006, SSRN Electronic Journal.

[42]  T. Wang Corporate Securities Fraud: Insights from a New Empirical Framework , 2010 .

[43]  David F. Larcker,et al.  Corporate governance, compensation consultants, and CEO pay levels , 2012 .

[44]  David Johnstone,et al.  Predicting Corporate Bankruptcy: An Evaluation of Alternative Statistical Frameworks , 2017 .

[45]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[46]  Weili Ge,et al.  Determinants of Weaknesses in Internal Control over Financial Reporting , 2006 .

[47]  M. Zmijewski METHODOLOGICAL ISSUES RELATED TO THE ESTIMATION OF FINANCIAL DISTRESS PREDICTION MODELS , 1984 .

[48]  Jeremy Bertomeu,et al.  Machine learning improves accounting: discussion, implementation and research opportunities , 2020, Review of Accounting Studies.

[49]  Matthias Schonlau,et al.  Support Vector Machines , 2016 .

[50]  Stephen P. Rowe,et al.  Using machine learning to predict auditor switches: How the likelihood of switching affects audit quality among non-switching clients , 2020 .

[51]  Patricia M. Dechow,et al.  The Quality of Accruals and Earnings: The Role of Accrual Estimation Errors , 2002 .

[52]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[53]  Petro Lisowsky Seeking Shelter: Empirically Modeling Tax Shelters Using Financial Statement Information , 2010 .

[54]  Allison Koester,et al.  Proxies and Databases in Financial Misconduct Research , 2017 .

[55]  Cory A. Cassell,et al.  Reviewing the SEC’s Review Process: 10-K Comment Letters and the Cost of Remediation , 2013 .

[56]  James A. Ohlson FINANCIAL RATIOS AND THE PROBABILISTIC PREDICTION OF BANKRUPTCY , 1980 .

[57]  K. Ramanna,et al.  Evidence on the use of unverifiable estimates in required goodwill impairment , 2011 .

[58]  Richard G. Sloan,et al.  Accrual Reliability, Earnings Persistence and Stock Prices , 2005 .

[59]  Carsten Zimmermann,et al.  Finding Needles in a Haystack: Using Data Analytics to Improve Fraud Prediction , 2015 .