Multilingual Financial Narrative Processing: Analyzing Annual Reports in English, Spanish, and Portuguese

This chapter describes and evaluates the use of Information Extraction and Natural Language Processing methods for extraction and analysis of financial annual reports in three languages: English, Spanish and Portuguese. The work described retains information on document structure which is needed to enable a clear distinction between narrative and financial statement components of annual reports and between individual sections within the narratives component. Extraction accuracy varies between languages with English exceeding 95 %. We apply the extraction methods on a comprehensive sample of annual reports published by UK, Spanish and Portuguese non-financial firms between 2003 and 2014.

[1]  R. Craig,et al.  Factors associated with the publication of a CEO letter , 2013 .

[2]  Simone Teufel,et al.  The Structure of Scientific Articles - Applications to Citation Indexing and Summarization , 2010, CSLI Studies in Computational Linguistics.

[3]  ICDAR 2009 Book Structure Extraction Competition , 2011, 2009 10th International Conference on Document Analysis and Recognition.

[4]  Ester Oliveras,et al.  Reporting Intellectual Capital in Spain , 2004 .

[5]  Robert P. Schumaker,et al.  An Analysis of Verbs in Financial News Articles and their Impact on Stock Price , 2010, HLT-NAACL 2010.

[6]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[7]  Mahmoud El-Haj,et al.  Detecting Document Structure in a Very Large Corpus of UK Financial Reports , 2014, LREC.

[8]  Chris Mallin International journal of disclosure and governance , 2013 .

[9]  Thomas M. Arnold When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2011 .

[10]  Información del conocimiento organizacional a través de los informes anuales publicados en las páginas web de las empresas , 2014 .

[11]  Leif Edvinsson,et al.  El capital intelectual: cómo identificar y calcular el valor de los recursos intangibles de su empresa , 1999 .

[12]  Emma García‐Meca,et al.  The explanatory factors of intellectual capital disclosure to financial analysts , 2005 .

[13]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[14]  Russell Craig,et al.  Corporate governance and intellectual capital reporting in a period of financial crisis: Evidence from Portugal , 2016, International Journal of Disclosure and Governance.

[15]  Francisca Tejedo Romero Información de los recursos intangibles ocultos: ¿memorias de sostenibilidad o informe anual? , 2016 .

[16]  M. Walker,et al.  Bias in the tone of forward‐looking narratives , 2010 .

[17]  Leslie D. Hodder,et al.  The Information Content of Forward-Looking Statements in Corporate Filings—A Na¨ive Bayesian Machine Learning Approach , 2010 .

[18]  M. D. C. Marques,et al.  Os activos intangíveis nas contas das empresas do PSI 20: uma evidência empírica , 2009 .

[19]  Annie Brooking El capital intelectual: el principal activo de las empresas del tercer milenio , 1997 .

[20]  Denilson Barbosa,et al.  Sentiment Analysis for Streams of Web Data: A Case Study of Brazilian Financial Markets , 2014, WebMedia.

[21]  Marco Cristo,et al.  Multi-Entity Polarity Analysis in Financial Documents , 2014, WebMedia.

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  Veronika Koller,et al.  ‘Metaphoring’ people out of this world: A Critical Discourse Analysis of a chairman’s statement of a UK defence firm , 2012 .

[24]  David Bamman,et al.  The Labeled Segmentation of Printed Books , 2017, EMNLP.

[25]  Bill McDonald,et al.  Textual Analysis in Accounting and Finance: A Survey , 2016 .

[26]  Rayner Alfred,et al.  Analysing market sentiment in financial news using lexical approach , 2013, 2013 IEEE Conference on Open Systems (ICOS).