Do Sentiments Matter in Fraud Detection? Estimating Semantic Orientation of Annual Reports

We present a novel approach for analysing the qualitative content of annual reports. Using natural language processing techniques we determine if sentiment expressed in the text matters in fraud detection. We focus on the Management Discussion and Analysis MD&A section of annual reports because of the nonfactual content present in this section, unlike other components of the annual reports. We measure the sentiment expressed in the text on the dimensions of polarity, subjectivity, and intensity and investigate in depth whether truthful and fraudulent MD&As differ in terms of sentiment polarity, sentiment subjectivity and sentiment intensity. Our results show that fraudulent MD&As on average contain three times more positive sentiment and four times more negative sentiment compared with truthful MD&As. This suggests that use of both positive and negative sentiment is more pronounced in fraudulent MD&As. We further find that, compared with truthful MD&As, fraudulent MD&As contain a greater proportion of subjective content than objective content. This suggests that the use of subjectivity clues such as presence of too many adjectives and adverbs could be an indicator of fraud. Clear cases of fraud show a higher intensity of sentiment exhibited by more use of adverbs in the "adverb modifying adjective" pattern. Based on the results of this study, frequent use of intensifiers, particularly in this pattern, could be another indicator of fraud. Moreover, the dimensions of subjectivity and intensity help in accurately classifying borderline examples of MD&As that are equal in sentiment polarity into fraudulent and truthful categories. When taken together, these findings suggest that fraudulent MD&As in contrast to truthful MD&As contain higher sentiment content. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Andrew J. Leone,et al.  The Importance of Distinguishing Errors from Irregularities in Restatement Research: The Case of Restatements and CEO/CFO Turnover , 2008 .

[2]  Laura K. Guerrero,et al.  Interpersonal deception: XII. Information management dimensions underlying deceptive and truthful messages , 1996 .

[3]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[4]  Werner Antweiler,et al.  Is All that Talk Just Noise? The Information Content of Internet Stock Message Boards , 2001 .

[5]  Feng Li Do Stock Market Investors Understand the Risk Sentiment of Corporate Annual Reports? , 2006 .

[6]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[7]  Diego García Sentiment during Recessions: Sentiment during Recessions , 2013 .

[8]  Hájek Petr,et al.  Forecasting stock prices using sentiment information in annual reports - A neural network and support vector regression approach , 2013 .

[9]  Surya B. Yadav,et al.  A computational model for financial reporting fraud detection , 2011, Decis. Support Syst..

[10]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[11]  Petr Hájek,et al.  Evaluating Sentiment in Annual Reports for Financial Distress Prediction Using Neural Networks and Support Vector Machines , 2013, EANN.

[12]  David B. Skillicorn,et al.  Accounting Variables, Deception, and a Bag of Words: Assessing the Tools of Fraud Detection , 2012 .

[13]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[14]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  Diego García Sentiment During Recessions , 2012 .

[17]  Sue R. Faerman,et al.  Can Linguistic Predictors Detect Fraudulent Financial Filings , 2010 .

[18]  James R. Curran,et al.  A Sentiment Detection Engine for Internet Stock Message Boards , 2009, ALTA.

[19]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[20]  Praveen Pathak,et al.  Detecting Management Fraud in Public Companies , 2010, Manag. Sci..

[21]  Paul C. Tetlock Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2005, The Journal of Finance.

[22]  Wei Guo,et al.  Executives' use of emotional language and investor reactions , 2014 .

[23]  Kevin C. Moffitt,et al.  Identification of fraudulent financial statements using linguistic credibility analysis , 2011, Decis. Support Syst..

[24]  Sunita Goel,et al.  Beyond the numbers: Mining the Annual Reports for Hidden cues Indicative of Financial Statement Fraud , 2012, Intell. Syst. Account. Finance Manag..

[25]  John R. Carlson,et al.  Deception in Computer-Mediated Communication , 2004 .

[26]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[27]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[28]  M. Fludernik,et al.  The fictions of language and the languages of fiction: The linguistic representation of speech and consciousness , 1995 .

[29]  B. Clinton,et al.  Early detection of fraud: Evidence from restatements , 2009 .

[30]  Sunita Goel Fraud Detection and Corporate Filings , 2014 .

[31]  P. Ekman,et al.  Felt, false, and miserable smiles , 1982 .

[32]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[33]  Patricia M. Dechow,et al.  Predicting Material Accounting Misstatements*: Predicting Material Accounting Misstatements , 2011 .

[34]  Daniel E. O'Leary,et al.  Blog mining-review and extensions: "From each according to his opinion" , 2011, Decis. Support Syst..

[35]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[36]  J. Nunamaker,et al.  Automating Linguistics-Based Cues for Detecting Deception in Text-Based Asynchronous Computer-Mediated Communications , 2004 .

[37]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[38]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[39]  Janyce Wiebe,et al.  Tracking Point of View in Narrative , 1994, Comput. Linguistics.

[40]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[41]  David B. Skillicorn,et al.  Detecting Fraud in Financial Reports , 2012, 2012 European Intelligence and Security Informatics Conference.

[42]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[43]  Jiawei Han,et al.  Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[44]  Norah E. Dunbar,et al.  Trust and deception in mediated communication , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[45]  Janyce Wiebe,et al.  Recognizing subjectivity: a case study in manual tagging , 1999, Natural Language Engineering.

[46]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[47]  Dan Roth,et al.  Part of Speech Tagging Using a Network of Linear Separators , 1998, ACL.

[48]  Yue Liu,et al.  Combining Language Model with Sentiment Analysis for Opinion Retrieval of Blog-Post , 2006, TREC.

[49]  Robert F. Whitelaw,et al.  News or Noise? Internet Postings and Stock Prices , 2001 .

[50]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[51]  Jeffrey T. Hancock,et al.  See No Evil: The Effect of Communication Medium and Motivation on Deception Detection , 2010 .

[52]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.