From Spin to Swindle: Identifying Falsification in Financial Text

Despite legislative attempts to curtail financial statement fraud, it continues unabated. This study makes a renewed attempt to aid in detecting this misconduct using linguistic analysis with data mining on narrative sections of annual reports/10-K form. Different from the features used in similar research, this paper extracts three distinct sets of features from a newly constructed corpus of narratives (408 annual reports/10-K, 6.5 million words) from fraud and non-fraud firms. Separately each of these three sets of features is put through a suite of classification algorithms, to determine classifier performance in this binary fraud/non-fraud discrimination task. From the results produced, there is a clear indication that the language deployed by management engaged in wilful falsification of firm performance is discernibly different from truth-tellers. For the first time, this new interdisciplinary research extracts features for readability at a much deeper level, attempts to draw out collocations using n-grams and measures tone using appropriate financial dictionaries. This linguistic analysis with machine learning-driven data mining approach to fraud detection could be used by auditors in assessing financial reporting of firms and early detection of possible misdemeanours.

[1]  Praveen Pathak,et al.  Making words work: Using financial text as a predictor of financial events , 2010, Decis. Support Syst..

[2]  Kurt Hornik,et al.  Text Mining Infrastructure in R , 2008 .

[3]  Philip M. McCarthy,et al.  The linguistic correlates of conversational deception: Comparing natural language processing technologies , 2010, Applied Psycholinguistics.

[4]  Johan L. Perols Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms , 2011 .

[5]  Sue R. Faerman,et al.  Can Linguistic Predictors Detect Fraudulent Financial Filings , 2010 .

[6]  Leo Guelman,et al.  Gradient boosting trees for auto insurance loss cost modeling and prediction , 2012, Expert Syst. Appl..

[7]  Mahmoud El-Haj,et al.  Detecting Document Structure in a Very Large Corpus of UK Financial Reports , 2014, LREC.

[8]  Bill McDonald,et al.  Measuring Readability in Financial Disclosures , 2013 .

[9]  Pamela Meyer,et al.  Liespotting: Proven Techniques to Detect Deception , 2010 .

[10]  TM Miner An experimental comparison of classification techniques for imbalanced credit scoring data sets using SAS ® Enterprise , 2012 .

[11]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[12]  Thomas D. Sandry,et al.  Applied Data Mining , 2005, Technometrics.

[13]  J. Nunamaker,et al.  Automating Linguistics-Based Cues for Detecting Deception in Text-Based Asynchronous Computer-Mediated Communications , 2004 .

[14]  Yu Zong,et al.  Applied Data Mining , 2013 .

[15]  Paul C. Tetlock Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2005, The Journal of Finance.

[16]  J. Friedman Stochastic gradient boosting , 2002 .

[17]  Noam Chomsky Language and Other Cognitive Systems. What Is Special About Language? , 2011 .

[18]  J. Sweeney,et al.  Fraudulently Misstated Financial Statements and Insider Trading: An Empirical Analysis , 1997 .

[19]  Surya B. Yadav,et al.  A computational model for financial reporting fraud detection , 2011, Decis. Support Syst..

[20]  Amir Hussain,et al.  A Review of Artificial Intelligence and Biologically Inspired Computational Approaches to Solving Issues in Narrative Financial Disclosure , 2013, BICS.

[21]  Kevin C. Moffitt,et al.  Identification of fraudulent financial statements using linguistic credibility analysis , 2011, Decis. Support Syst..

[22]  R. Bloomfield The 'Incomplete Revelation Hypothesis' and Financial Reporting , 2002 .

[23]  S. Kothari,et al.  The Effect of Disclosures by Management, Analysts, and Business Press on Cost of Capital, Return Volatility, and Analyst Forecasts: A Study Using Content Analysis , 2009 .

[24]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[25]  Jeffrey T. Hancock,et al.  On Lying and Being Lied To: A Linguistic Analysis of Deception in Computer-Mediated Communication , 2007 .

[26]  Erik Cambria,et al.  Sentic Computing: Techniques, Tools, and Applications , 2012 .

[27]  B. Rutherford Genre Analysis of Corporate Annual Report Narratives , 2005 .

[28]  Pentti Kanerva,et al.  Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors , 2009, Cognitive Computation.

[29]  Vadlamani Ravi,et al.  Bankruptcy prediction in banks and firms via statistical and intelligent techniques - A review , 2007, Eur. J. Oper. Res..

[30]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[31]  Tim Loughran,et al.  Measuring Readability in Financial Disclosures: Measuring Readability in Financial Disclosures , 2014 .

[32]  Michael J. Shaw,et al.  Quantitative methods for Detection of Financial Fraud , 2011 .

[33]  Yong Hu,et al.  The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature , 2011, Decis. Support Syst..

[34]  Kin Lo,et al.  Earnings Management and Annual Report Readability , 2016 .

[35]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[36]  Kathleen A. Kaminski,et al.  Can financial ratios detect fraudulent financial reporting , 2004 .

[37]  Fadi A. Thabtah,et al.  Intelligent phishing detection system for e-banking using fuzzy data mining , 2010, Expert Syst. Appl..

[38]  Gaurav Kapoor,et al.  Detecting evolutionary financial statement fraud , 2011, Decis. Support Syst..

[39]  Vishal Gupta,et al.  A Novel Hybrid Text Summarization System for Punjabi Text , 2015, Cognitive Computation.

[40]  A. H. Adelberg Narrative Disclosures Contained in Financial Reports: Means of Communication or Manipulation? , 1979 .

[41]  Arthur C. Graesser,et al.  Automated Evaluation of Text and Discourse with Coh-Metrix: Introduction , 2014 .

[42]  S. Chintalapati,et al.  Application of Data Mining Techniques for Financial Accounting Fraud Detection Scheme , 2013 .

[43]  M. C. Jensen,et al.  Harvard Business School; SSRN; National Bureau of Economic Research (NBER); European Corporate Governance Institute (ECGI); Harvard University - Accounting & Control Unit , 1976 .

[44]  Lynnette D. Purda Accounting Variables , Deception , and a Bag of Words : Assessing the Tools of Fraud Detection * , 2014 .

[45]  J. Hassell,et al.  Voluntary Causal Disclosures: Tendencies and Capital Market Reaction , 2000 .

[46]  Andrei Sorin Sabau Survey of Clustering Based Financial Fraud Detection Research , 2012 .

[47]  Amir Sufi Information Asymmetry and Financing Arrangements: Evidence from Syndicated Loans , 2007 .

[48]  Vadlamani Ravi,et al.  Detection of financial statement fraud and feature selection using data mining techniques , 2011, Decis. Support Syst..

[49]  Niamh Brennan,et al.  Discretionary Disclosure Strategies in Corporate Narratives: Incremental Information or Impression Management? , 2008 .

[50]  John Sinclair,et al.  Corpus, Concordance, Collocation , 1991 .

[51]  Feng Li Annual Report Readability, Current Earnings, and Earnings Persistence , 2008 .

[52]  Padmini Srinivasan,et al.  On the predictive ability of narrative disclosures in annual reports , 2010, Eur. J. Oper. Res..

[53]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[54]  Ronen Feldman,et al.  Management's Tone Change, Post Earnings Announcement Drift and Accruals , 2009 .

[55]  Kostas Karpouzis,et al.  Emerging Artificial Intelligence Applications in Computer Engineering - Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies , 2007, Emerging Artificial Intelligence Applications in Computer Engineering.

[56]  Mostafa Keikha,et al.  Document Representation and Quality of Text: An Analysis , 2008 .