Can Linguistic Predictors Detect Fraudulent Financial Filings

ABSTRACT: Extensive research has been done on the analytical and empirical examination of financial data in annual reports to detect fraud; however, there is scant research on the analysis of text in annual reports to detect fraud. The basic premise of this research is that there are clues hidden in the text that can be detected to determine the likelihood of fraud. In this research, we examine both the verbal content and the presentation style of the qualitative portion of the annual reports using natural language processing tools and explore linguistic features that distinguish fraudulent annual reports from nonfraudulent annual reports. Our results indicate that employment of linguistic features is an effective means for detecting fraud. We were able to improve the prediction accuracy of our fraud detection model from initial baseline results of 56.75 percent accuracy, using a “bag of words” approach, to 89.51 percent accuracy when we incorporated linguistically motivated features inspired by our infor...

[1]  George Lakoff,et al.  Hedges: A study in meaning criteria and the logic of fuzzy concepts , 1973, J. Philos. Log..

[2]  Graeme Hirst,et al.  Detecting Stylistic Inconsistencies in Collaborative Writing , 1996, The New Writing Environment.

[3]  Boris Katz,et al.  Style versus Expression in Literary Narratives , 2022 .

[4]  Patricia M. Dechow,et al.  Predicting Material Accounting Misstatements*: Predicting Material Accounting Misstatements , 2011 .

[5]  R. Taffler,et al.  The chairman’s statement ‐ A content analysis of discretionary narrative disclosures , 2000 .

[6]  Patricia M. Dechow,et al.  Predicting Material Accounting Misstatements , 2010 .

[7]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .

[8]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[9]  R W Montague,et al.  Internal auditing. , 1984, Healthcare computing & communications.

[10]  S. Brooks Marshall,et al.  Content Analysis of Information Cited in Reports of Sell-Side Financial Analysts , 1998 .

[11]  Susan Parker,et al.  The effects of audit committee activity and independence on corporate fraud , 2000 .

[12]  Margaret R. Garnsey,et al.  Automatic Classification of Financial Accounting Concepts , 2006 .

[13]  Clive S. Lennox,et al.  Do companies successfully engage in opinion-shopping? Evidence from the UK , 2000 .

[14]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[15]  Alvin A. Arens,et al.  Auditing: An Integrated Approach , 1976 .

[16]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[17]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.

[18]  William F. Messier,et al.  A Generalized Qualitative-Response Model and the Analysis of Management Fraud , 1996 .

[19]  J. Sweeney,et al.  Fraudulently Misstated Financial Statements and Insider Trading: An Empirical Analysis , 1997 .

[20]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[21]  David I. Holmes,et al.  Feature-Finding for Text Classification , 1996 .

[22]  Robert W. Ingram,et al.  The Difference between Earnings and Operating Cash Flow as an Indicator of Financial Reporting Fraud , 1999 .

[23]  R. Ingram,et al.  ASSESSING THE INFORMATION CONTENT OF NARRATIVE DISCLOSURES IN EXPLAINING BANKRUPTCY , 2008 .

[24]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[25]  John J. Willingham,et al.  Management fraud: Detection and deterrence , 1980 .

[26]  Jay F. Nunamaker,et al.  An exploratory study into deception detection in text-based computer-mediated communication , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[27]  Howard P. Iker,et al.  An historical note on the use of word-frequency contiguities in content analysis , 1974 .

[28]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[29]  M. D. Beneish,et al.  The Detection of Earnings Manipulation , 1999 .

[30]  Thomas E. McKee,et al.  Bankruptcy theory development and classification via genetic programming , 2006, Eur. J. Oper. Res..

[31]  Charalambos Spathis Detecting false financial statements using published data: some evidence from Greece , 2002 .

[32]  Susan Scholz,et al.  The Circumstances and Legal Consequences of Non‐GAAP Reporting: Evidence from Restatements* , 2004 .

[33]  R. Weber Basic Content Analysis , 1986 .

[34]  Noriko Kando,et al.  Certainty Identification in Texts: Categorization Model and Manual Tagging Results , 2023 .

[35]  Obeua S. Persons Using Financial Statement Data To Identify Factors Associated With Fraudulent Financial Reporting , 2011 .

[36]  Kenneth O. Cogger,et al.  Neural network detection of management fraud using published financial data , 1998, Intell. Syst. Account. Finance Manag..

[37]  Jon Carpenter,et al.  Using rule induction for knowledge acquisition: An expert systems approach to evaluating material errors and irregularities , 1995 .

[38]  R. Weber Basic content analysis, 2nd ed. , 1990 .

[39]  Yannis Manolopoulos,et al.  Data Mining techniques for the detection of fraudulent financial statements , 2007, Expert Syst. Appl..

[40]  G. Yule ON SENTENCE- LENGTH AS A STATISTICAL CHARACTERISTIC OF STYLE IN PROSE: WITH APPLICATION TO TWO CASES OF DISPUTED AUTHORSHIP , 1939 .

[41]  Roger Simnett,et al.  The Information Content of Management’s Prospective Comments in Financially Distressed Companies: A Note , 2002 .

[42]  J. Courtis,et al.  An Investigation into Annual Report Readability and Corporate Risk-Return Relationships , 1986 .

[43]  Anthony Steele The Accuracy of Chairmen's Non-quantified Forecasts: An Exploratory Study , 1982 .

[44]  R. Taffler,et al.  The Chairman's Statement and Corporate Financial Performance , 1992 .

[45]  E. Henry Market Reaction to Verbal Components of Earnings Press Releases: Event Study Using a Predictive Algorithm , 2006 .

[46]  Kathleen A. Kaminski,et al.  Can financial ratios detect fraudulent financial reporting , 2004 .

[47]  Burcu Dikmen,et al.  The detection of earnings manipulation: the three‐phase cutting plane algorithm using mathematical programming , 2010 .

[48]  B. Green,et al.  Assessing the risk of management fraud through neural network technology , 1997 .

[49]  Gary J. Koehler,et al.  Quantifying the risk of financial events using kernel methods and information retrieval , 2005 .

[50]  Jagdish Gangolly,et al.  On the Automatic Classification of Accounting concepts: Preliminary Results of the Statistical Analysis of Term-Document Frequencies , 2002 .

[51]  D. BeneishMessod,et al.  The Detection of Earnings Manipulation , 1999 .

[52]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[53]  Joseph V. Carcello,et al.  A Decision Aid for Assessing the Likelihood of Fraudulent Financial Reporting , 2000 .

[54]  M. Beasley An Empirical Analysis of the Relation between Board of Director Composition and Financial Statement Fraud , 1998 .

[55]  Donald R. Jones,et al.  Reliance on Decision Aids: An Examination of Auditors' Assessment of Management Fraud , 1997 .

[56]  George Lakoff,et al.  Hedges: A Study In Meaning Criteria And The Logic Of Fuzzy Concepts , 1973 .

[57]  Robert G. Insley,et al.  Performance and Readability: A Comparison of Annual Reports of Profitable and Unprofitable Corporations , 1993 .

[58]  Earl F. Dulaney,et al.  CHANGES IN LANGUAGE BEHAVIOR AS A FUNCTION OF VERACITY , 1982 .

[59]  Jaan Mikk Methods for Determining Optimal Readability of Texts , 1995, J. Quant. Linguistics.

[60]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[61]  Michael Y. Hu,et al.  Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis , 1999, Eur. J. Oper. Res..

[62]  Deniz Senturk-Doganaksoy,et al.  A genetic algorithm approach to detecting temporal patterns indicative of financial statement fraud , 2007, Intell. Syst. Account. Finance Manag..

[63]  Kevin Colwell,et al.  Interviewing techniques and the assessment of statement credibility , 2002 .

[64]  Boris Katz,et al.  Capturing Expression Using Linguistic Information , 2005, AAAI.

[65]  H. Eugene Baker,et al.  Relationship Between Annual Report Readability and Corporate Financial Performance , 1992 .

[66]  Eric Abrahamson,et al.  THE INFORMATION CONTENT OF THE PRESIDENT'S LETTER TO SHAREHOLDERS , 1996 .

[67]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[68]  Eric Abrahamson,et al.  Concealment of Negative Organizational Outcomes: An Agency Theory Perspective , 1994 .

[69]  Sreerupa Das,et al.  Readability modelling and comparison of one and two parametric fit: A case study in Bangla* , 2006, J. Quant. Linguistics.