Textual Analysis in Accounting and Finance: A Survey

Relative to quantitative methods traditionally used in accounting and finance, textual analysis is substantially less precise. Thus, understanding the art is of equal importance to understanding the science. In this survey, we describe the nuances of the method and, as users of textual analysis, some of the tripwires in implementation. We also review the contemporary textual analysis literature and highlight areas of future research.

[1]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[2]  Diego García Sentiment During Recessions , 2012 .

[3]  Andrew J. Leone,et al.  A Plain English Measure of Financial Reporting Readability , 2017 .

[4]  Bill McDonald,et al.  The Use of Word Lists in Textual Analysis , 2015 .

[5]  C. B. Williams Mendenhall's studies of word-length distribution in the works of Shakespeare and Bacon , 1975 .

[6]  Matt Taddy,et al.  Document Classification by Inversion of Distributed Language Representations , 2015, ACL.

[7]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[8]  David B. Skillicorn,et al.  Accounting Variables, Deception, and a Bag of Words: Assessing the Tools of Fraud Detection , 2012 .

[9]  S. Pokharel Wisdom of Crowds: The Value of Stock Opinions Transmitted through Social Media , 2014 .

[10]  Ellyn R. Boukus,et al.  The Information Content of FOMC Minutes , 2006 .

[11]  Lee D. Parker,et al.  Accounting Report Readability: The Use of Readability Techniques , 1986 .

[12]  Sreeparna Banerjee,et al.  Pattern Recognition Approaches to Japanese Character Recognition , 2012 .

[13]  Y. Kim,et al.  Self Attribution Bias of the CEO: Evidence from CEO Interviews on CNBC , 2013 .

[14]  Feng Li The Information Content of Forward-Looking Statements in Corporate Filings—A Naïve Bayesian Machine Learning Approach , 2010 .

[15]  Evgeniy Gabrilovich,et al.  Concept-Based Information Retrieval Using Explicit Semantic Analysis , 2011, TOIS.

[16]  Werner Antweiler,et al.  Is All that Talk Just Noise? The Information Content of Internet Stock Message Boards , 2001 .

[17]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[18]  Bill McDonald,et al.  A Wolf in Sheep’s Clothing: The Use of Ethics-Related Terms in 10-K Reports , 2007 .

[19]  Rafael Rogo,et al.  Restoring the Tower of Babel: How Foreign Firms Communicate with US Investors , 2013 .

[20]  David H. Solomon,et al.  Selective Publicity and Stock Prices , 2010 .

[21]  Baixiao Liu,et al.  The Role of the Media in Corporate Governance: Do the Media Influence Managers’ Capital Allocation Decisions? , 2013 .

[22]  Gary C. Biddle,et al.  How Does Financial Reporting Quality Relate to Investment Efficiency? , 2009 .

[23]  J. Zechner,et al.  Slow-Moving Real Information in Merger Arbitrage , 2014 .

[24]  Colin Seymour-Ure,et al.  Content Analysis in Communication Research. , 1972 .

[25]  Toni M. Whited,et al.  Looking for Risk in Words: A Narrative Approach to Measuring the Pricing Implications of Financial Constraints , 2014 .

[26]  Gerard Hoberg,et al.  Text-Based Network Industries and Endogenous Product Differentiation , 2010, Journal of Political Economy.

[27]  Angela K. Davis,et al.  Managers' Use of Language Across Alternative Disclosure Outlets: Earnings Press Releases Versus MD&A , 2011 .

[28]  Tim Loughran,et al.  Using 10-K Text to Gauge Financial Constraints , 2015 .

[29]  Peter D. Wysocki,et al.  The Economics of Disclosure and Financial Reporting Regulation: Evidence and Suggestions for Future Research , 2015 .

[30]  A. Lawrence Individual investors and financial disclosure , 2013 .

[31]  Feng Li Annual Report Readability, Current Earnings, and Earnings Persistence , 2008 .

[32]  Kenneth R. Ahern,et al.  Who Writes the News? Corporate Press Releases During Merger Negotiations , 2014 .

[33]  C. Leuz,et al.  Disclosure and the Cost of Capital: Evidence from Firms’ Responses to the Enron Shock , 2008 .

[34]  Xiao-Jun Zhang,et al.  Financial reporting complexity and investor underreaction to 10-K information , 2009 .

[35]  Shelly Denham,et al.  Accounting of disclosures. , 2003, The Journal of the Kentucky Medical Association.

[36]  Lillian F. Mills,et al.  Forecasting Tax Expense: New Evidence from Analysts , 2015 .

[37]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[38]  Bill McDonald,et al.  Using 10-K Text to Gauge Financial Constraints , 2015 .

[39]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[40]  Stephen V. Brown,et al.  Large-Sample Evidence on Firms’ Year-Over-Year MD&A Modifications , 2011 .

[41]  Marti A. Hearst,et al.  Adaptive Sentence Boundary Disambiguation , 1994, ANLP.

[42]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[43]  R. Ingram,et al.  ASSESSING THE INFORMATION CONTENT OF NARRATIVE DISCLOSURES IN EXPLAINING BANKRUPTCY , 2008 .

[44]  Feng Li,et al.  Estimating the Amount of Estimation in Accruals , 2016 .

[45]  Jin Lei,et al.  Annual Report Readability, Tone Ambiguity, and the Cost of Borrowing , 2015, Journal of Financial and Quantitative Analysis.

[46]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[47]  Dawn Matsumoto,et al.  What Makes Conference Calls Useful? The Information Content of Managers' Presentations and Analysts' Discussion Sessions , 2011 .

[48]  D. Larcker,et al.  Detecting Deceptive Discussions in Conference Calls , 2012 .

[49]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[50]  S. Kothari,et al.  The Effect of Disclosures by Management, Analysts, and Business Press on Cost of Capital, Return Volatility, and Analyst Forecasts: A Study Using Content Analysis , 2009 .

[51]  Colm Kearney,et al.  Textual Sentiment in Finance: A Survey of Methods and Models , 2013 .

[52]  E. Henry Are Investors Influenced By How Earnings Press Releases Are Written? , 2006 .

[53]  Sofus A. Macskassy,et al.  More than Words: Quantifying Language to Measure Firms' Fundamentals the Authors Are Grateful for Assiduous Research Assistance from Jie Cao and Shuming Liu. We Appreciate Helpful Comments From , 2007 .

[54]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[55]  Mohan Venkatachalam,et al.  The Power of Voice: Managerial Affective States and Future Firm Performance , 2011 .

[56]  B. Miller The Effects of Reporting Complexity on Small and Large Investor Trading , 2010 .

[57]  R. Bloomfield Discussion of “Annual report readability, current earnings, and earnings persistence” , 2008 .

[58]  N. Schoon,et al.  Earnings Conference Calls and Stock Returns: The Incremental Informativeness of Textual Tone , 2012 .

[59]  Weili Ge,et al.  The effect of manager-specific optimism on the tone of earnings conference calls , 2014 .

[60]  Kristian D. Allee,et al.  The Structure of Voluntary Disclosure Narratives: Evidence from Tone Dispersion , 2014 .

[61]  Reut Tsarfaty,et al.  Parsing Morphologically Rich Languages: Introduction to the Special Issue , 2013, Computational Linguistics.

[62]  George R. Klare,et al.  The measurement of readability , 1963 .

[63]  Feng Li Textual Analysis of Corporate Disclosures: A Survey of the Literature , 2011 .

[64]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[65]  Bill McDonald,et al.  IPO First-Day Returns, Offer Price Revisions, Volatility, and Form S-1 Language , 2013 .

[66]  Kristina Rennekamp,et al.  Processing Fluency and Investors’ Reactions to Disclosure Readability , 2012 .

[67]  Phil Berger,et al.  The Information Content of Forward-Looking Statements in Corporate Filings—A Na¨ive Bayesian Machine Learning Approach , 2010 .

[68]  Tyler Shumway,et al.  Is Sound Just Noise? , 1998 .

[69]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[70]  Sanjiv Ranjan Das,et al.  Text and Context: Language Analytics in Finance , 2014 .

[71]  Jeremy Piger,et al.  Beyond the Numbers: Measuring the Information Content of Earnings Press Release Language , 2011 .

[72]  Mark Lang,et al.  Textual analysis and international financial reporting: Large sample evidence ☆ , 2015 .

[73]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[74]  K. Frazier,et al.  A METHODOLOGY FOR THE ANALYSIS OF NARRATIVE ACCOUNTING DISCLOSURES , 1984 .

[75]  Robert C. Broderick,et al.  The Catholic Encyclopedia , 1976 .

[76]  Gill A. Pratt,et al.  Is a cambrian explosion coming for robotics , 2015 .

[77]  Daniel M. Dunlavy,et al.  TopicView: Visually Comparing Topic Models of Text Collections , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[78]  Kenneth J. Merkley,et al.  The Effect of Annual Report Readability on Analyst Following and the Properties of Their Earnings Forecasts , 2011 .

[79]  Brian P. Miller,et al.  The Impact of Narrative Disclosure Readability on Bond Ratings and Rating Agency Disagreement , 2014 .

[80]  Daniel J. Taylor,et al.  Linguistic Complexity in Firm Disclosures: Obfuscation or Information? , 2017 .

[81]  Andrei Mikheev,et al.  Periods, Capitalized Words, etc. , 2002, CL.

[82]  Bill McDonald,et al.  Measuring Readability in Financial Disclosures , 2013 .

[83]  Chenchuramaiah T. Bathala Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2007 .

[84]  Eugene F. Soltes,et al.  Winners in the Spotlight: Media Coverage of Fund Holdings as a Driver of Flows , 2013 .

[85]  David R. Peterson,et al.  Earnings Conference Call Content and Stock Price: The Case of REITs , 2010 .

[86]  Jun Li,et al.  The Opposing Effects of Complexity and Information Content on Uncertainty Dynamics: Evidence from 10-K Filings , 2016, Management Science.