Integration of Structured and Unstructured Data in the Financial Analysis Domain - A state of the Art

There is a consensus (Hoffman and Strand, 2001; Hannon, 2002; Bovee et al., 2005; Willis, 2005; Cox, 2006) that XBRL (Extensible Business Reporting Language) as a technical standard for facilitating transfer and analysis of financial statements could improve the speed and quality of transmitting, analyzing, and even more accurate financial reports by providing machine readable documents. Footnotes, which include important explanations about financial values, have still an unstructured format and are an obstacle for analysts and other stakeholders who want to benefit from analyzing financial statements. It is of interest how data integration approaches can support and facilitate the process of data extraction from footnotes automatically to gain accurate and reasonable analysis to avoid manual tasks. To address this issue, a state of the art is needed to identify and cluster relevant existing methods in terms of structured and unstructured data integration. It is shown that most of the existing literature is focused on a storage level of data integration. Other researchers deal with methods and tools to integrate and analyze structured and unstructured data separately. But, no identified paper illustrates an unstructured data integration solution to support analytical tasks based on XBRL documents.

[1]  Christiane Dominovic ARCH: incorporating usability into a data integration framework , 2009, iiWAS.

[2]  Carsten Felden,et al.  Integrating structured and unstructured data in a business-intelligence-system , 2006 .

[3]  Andreas Harth,et al.  Challenges Ahead for Converging Financial Data , 2009 .

[4]  Maciej Piechocki,et al.  XBRL financial reporting supply chain architecture , 2007 .

[5]  Thomas Klose,et al.  Text mining and visualization tools - Impressions of emerging capabilities , 2008 .

[6]  Elisa Bertino,et al.  XML and Data Integration , 2001, IEEE Internet Comput..

[7]  Jia-Lang Seng,et al.  A schema and ontology-aided intelligent information integration , 2009, Expert Syst. Appl..

[8]  Hans-Georg Kemper,et al.  Management Support with Structured and Unstructured Data—An Integrated Business Intelligence Framework , 2008, Inf. Syst. Manag..

[9]  Franz J. Kurfess,et al.  Ontology-Based Semantic Classification of Unstructured Documents , 2003, Adaptive Multimedia Retrieval.

[10]  Mike Willis XBRL and Data Standardization: Transforming the Way CPAs Work; Save Time and Improve Reporting , 2005 .

[11]  M. Bradbury,et al.  Capitalizing Non-Cancelable Operating Leases , 2003 .

[12]  Diane J. Janvrin,et al.  The Process Of Creating XBRL Instance Documents: A Research Framework , 2010, BIS 2010.

[13]  Carlos H. Caldas,et al.  Management and analysis of unstructured construction data types , 2008, Adv. Eng. Informatics.

[14]  David P. Franz,et al.  Evaluating Constructive Lease Capitalization and Off‐Balance‐Sheet Financing: An Instructional Case with FedEx and UPS , 2012 .

[15]  Hannu Vanharanta,et al.  Combining data and text mining techniques for analysing financial reports , 2004, Intell. Syst. Account. Finance Manag..

[17]  Chin-Tsai Lin,et al.  Mining the text information to optimizing the customer relationship management , 2009, Expert Syst. Appl..

[18]  Yi-fang Brook Wu,et al.  Information Mining: Integrating Data Mining and Text Mining for Business Intelligence , 2006, AMCIS.

[19]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[20]  Carsten Felden,et al.  Break-Up Analysis: A Method to Regain Trust in Business Transactions , 2013 .

[21]  Miklos A. Vasarhelyi,et al.  Financial Reporting and Auditing Agent with Net Knowledge (FRAANK) and eXtensible Business Reporting Language (XBRL) , 2005, J. Inf. Syst..