Segment Information Extraction from Financial Annual Reports Using Neural Network

This is an extension from a selected paper from JSAI2019. To extract business contents automatically from financial reports is an important problem in the financial area. Especially, segment names and their explanations are important contents that should be extracted. However, the methods for extracting these types of information from financial reports have not been established. In this study, we aim to develop a practical solution for extracting these types of information. To solve this problem, we developed a manually annotated dataset for the task of extracting the segment names and their explanations of each company from financial reports and then developed a recurrent neural network model to solve this task. Our method using the manually annotated dataset outperformed the baseline methods in the task of extracting segment names and their explanations of each company from annual financial reports. In addition, we experimentally demonstrated that our method can be available for this task even when we have a small training dataset. This work is the first work for applying a machine learning method to the task of extracting segment names and their explanations. The insights from this work should be valuable in the industrial area.

[1]  Hiroyuki Sakai,et al.  Extraction of sentences concerning business performance forecast and economic forecast from summaries of financial statements by deep learning , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Ganesh Ramakrishnan,et al.  Numerical Relation Extraction with Minimal Supervision , 2016, AAAI.

[4]  Sumali Conlon,et al.  A Rule-Based System to Extract Financial Information , 2012, J. Comput. Inf. Syst..

[5]  Luís Torgo,et al.  Automatic Selection of Table Areas in Documents for Information Extraction , 2003, EPIA.

[6]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[7]  Hiroyuki Sakai,et al.  Discovery of rare causal knowledge from financial statement summaries , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[8]  Wei Wang,et al.  Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering , 2018, ACL.

[9]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[10]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[11]  Ming Zhou,et al.  Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[12]  Heeyoung Lee,et al.  On the Importance of Text Analysis for Stock Price Prediction , 2014, LREC.

[13]  Petr Hájek,et al.  Mining corporate annual reports for intelligent detection of financial statement fraud - A comparative study of machine learning methods , 2017, Knowl. Based Syst..

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[16]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[17]  André Freitas,et al.  A Survey on Open Information Extraction , 2018, COLING.

[18]  Ming Zhou,et al.  Neural Open Information Extraction , 2018, ACL.

[19]  Martin Walker,et al.  Retrieving, classifying and analysing narrative commentary in unstructured (glossy) annual reports published as PDF files , 2019, Accounting and Business Research.

[20]  Ichiro Sakata,et al.  Extractive Summarization Using Multi-Task Learning with Document Classification , 2017, EMNLP.

[21]  Mahmoud El-Haj,et al.  Detecting Document Structure in a Very Large Corpus of UK Financial Reports , 2014, LREC.