Using Data Crawlers and Semantic Web to Build Financial XBRL Data Generators: The SONAR Extension Approach

Precise, reliable and real-time financial information is critical for added-value financial services after the economic turmoil from which markets are still struggling to recover. Since the Web has become the most significant data source, intelligent crawlers based on Semantic Technologies have become trailblazers in the search of knowledge combining natural language processing and ontology engineering techniques. In this paper, we present the SONAR extension approach, which will leverage the potential of knowledge representation by extracting, managing, and turning scarce and disperse financial information into well-classified, structured, and widely used XBRL format-oriented knowledge, strongly supported by a proof-of-concept implementation and a thorough evaluation of the benefits of the approach.

[1]  Ajit Kambil What is your Web 5.0 strategy , 2008 .

[2]  René Thiemann,et al.  Xml , 2014, Arch. Formal Proofs.

[3]  Ron Weber,et al.  XML, XBRL, and the future of business and business reporting , 2003 .

[4]  Shian-Hua Lin,et al.  Automatic sitemaps generation: Exploring website structures using block extraction and hyperlink analysis , 2011, Expert Syst. Appl..

[5]  John H. Gerdes,et al.  Timeliness of investor relations data at corporate web sites , 2005, CACM.

[6]  Roger Debreceny,et al.  The production and use of semantically rich accounting reports on the Internet: XML and XBRL , 2001, Int. J. Account. Inf. Syst..

[7]  Paul R. Smart,et al.  Controlled Natural Languages and the Semantic Web , 2008 .

[8]  Feng Li,et al.  Managing knowledge on the Web - Extracting ontology from HTML Web , 2009, Decis. Support Syst..

[9]  Alton Yeow-Kuan Chua,et al.  Resource discovery through social tagging: a classification and content analytic approach , 2009, Online Inf. Rev..

[10]  Sewook Oh,et al.  Target Concept Selection by Property Overlap in Ontology Population , 2008 .

[11]  Carlos García Moreno,et al.  SONAR: A Semantically Empowered Financial Search Engine , 2009, IWINAC.

[12]  Thierry Declerck,et al.  Translating XBRL Into Description Logic. An Approach Using Protege, Sesame & OWL , 2006, BIS.

[13]  Somnath Bhattacharya,et al.  Do early and voluntary filers of financial information in XBRL format signal superior corporate governance and operating performance? , 2008, Int. J. Account. Inf. Syst..

[14]  Chris Partridge,et al.  A Synthesis of State of the Art Enterprise Ontologies Work in Progress Chris Partridge Milena Stefanova , 2001 .

[15]  Rosa Gil,et al.  Facilitating Business Interoperability from the Semantic Web , 2007, BIS.

[16]  P. Starr Economics : principles in action , 1978 .

[17]  Yuzuru Tanaka,et al.  Interactive web-wrapper construction for extracting relational information from web documents , 2005, WWW '05.

[18]  Francis Eng Hock Tay,et al.  Economic and financial prediction using rough sets model , 2002, Eur. J. Oper. Res..

[19]  Shaomin Li,et al.  Costs and benefits of XBRL adoption: early evidence , 2008, CACM.

[20]  Abraham Bernstein,et al.  GINO - A Guided Input Natural Language Ontology Editor , 2006, SEMWEB.

[21]  Abraham Bernstein,et al.  How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users? , 2007, ISWC/ASWC.

[22]  Rosa Gil,et al.  Triplificating and linking XBRL financial data , 2010, I-SEMANTICS '10.

[23]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[24]  Stuart E. Madnick,et al.  Semantic Integration Approach to Efficient Business Data Supply Chain: Integration Approach to Interoperable XBRL , 2007 .

[25]  A. o'sullivan Economics Principles in Action , 2001 .

[26]  R. Debreceny,et al.  The determinants of Internet financial reporting , 2002 .

[27]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[28]  Hong-Gee Kim,et al.  Ontology-Based Controlled Natural Language Editor Using CFG with Lexical Dependency , 2007, ISWC/ASWC.

[29]  Rafael Valencia-García,et al.  Populating Knowledge Based Decision Support Systems , 2010, Int. J. Decis. Support Syst. Technol..

[30]  Laureen A. Maines,et al.  Does Search-facilitating Technology Improve the Transparency of Financial Reporting? , 2004 .

[31]  Byung-Hyun Ha,et al.  A multi-layered application for the gross description using Semantic Web technology , 2005, Int. J. Medical Informatics.

[32]  Mark S. Fox,et al.  An organizational ontology for enterprise modeling , 1998 .

[33]  Vernon J. Richardson,et al.  Dissemination of information for investors at corporate Web sites , 2002 .

[34]  Wenchuan Sun,et al.  Application and Neediness of Extensible Business Reporting Language , 2009, 2009 International Forum on Information Technology and Applications.

[36]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[37]  Victor Carneiro,et al.  Automatically maintaining navigation sequences for querying semi-structured web sources , 2007, Data Knowl. Eng..

[38]  Rafael Valencia-García,et al.  BioOntoVerb: A top level ontology based framework to populate biomedical ontologies from texts , 2012, Knowl. Based Syst..

[39]  Patricia Ordóñez de Pablos,et al.  Analysis of XBRL documents containing accounting information of listed firms using Semantic Web Technologies , 2007, MTSR.

[40]  Jun'ichi Tatemura,et al.  Supporting OLAP operations over imperfectly integrated taxonomies , 2008, SIGMOD Conference.

[41]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[42]  Robert E. Pinsker,et al.  XBRL Usage: A Socio-Economic Perspective , 2005 .

[43]  Chong Wang,et al.  PANTO: A Portable Natural Language Interface to Ontologies , 2007, ESWC.

[44]  Asheq Rahman,et al.  Quarterly reporting in a voluntary disclosure environment: Its benefits, drawbacks and determinants , 2007 .

[45]  James C. Spohrer,et al.  Service Science, Management, Engineering, and Design (SSMED): An Emerging Discipline - Outline & References , 2009, Int. J. Inf. Syst. Serv. Sect..

[46]  Gerhard Friedrich,et al.  Automated ontology instantiation from tabular web sources - The AllRight system , 2009, J. Web Semant..

[47]  Steffen Staab,et al.  Ontology Learning , 2004, Encyclopedia of Machine Learning and Data Mining.

[48]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[49]  Rafael Valencia-García,et al.  OWLPath: An OWL Ontology-Guided Query Editor , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[50]  Ya‐wen Yang,et al.  The impact of corporate governance on Internet financial reporting , 2008 .

[51]  Ahmed A. El-Masry,et al.  The impact of board independence and ownership structure on the timeliness of corporate internet reporting of Irish‐listed companies , 2008 .

[52]  K. Selçuk Candan,et al.  Structure- and Extension-Informed Taxonomy Alignment , 2008, ODBIS.

[53]  Pablo Castells,et al.  Semantic Web Technologies for Economic and Financial Information Management , 2004, ESWS.

[54]  M. Jones,et al.  Financial reporting on the Internet by 2010: a consensus view , 2004 .

[55]  Chien Chin Chen,et al.  Quality evaluation of product reviews using an information quality framework , 2011, Decis. Support Syst..

[56]  Ricardo Colomo Palacios,et al.  SEMO: a framework for customer social networks analysis based on semantics , 2010, J. Inf. Technol..

[57]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[58]  Delvin D. Hawley,et al.  Artificial Neural Systems: A New Tool for Financial Decision-Making , 1990 .

[59]  Steffen Staab,et al.  Knowledge Processes and Ontologies , 2001, IEEE Intell. Syst..

[60]  Roger Debreceny,et al.  Firm-specific determinants of continuous corporate disclosures , 2005 .

[61]  Hsin-Chang Yang Automatic generation of semantically enriched web pages by a text mining approach , 2009, Expert Syst. Appl..

[62]  Saeed J. Roohani Trust and Data Assurances in Capital Markets: The Role of Technology Solutions , 2003 .

[63]  Mehrnoush Shamsfard,et al.  Learning ontologies from natural language texts , 2004, Int. J. Hum. Comput. Stud..

[64]  John Kingston,et al.  Towards a Financial Fraud Ontology: A Legal Modelling Approach , 2004, Artificial Intelligence and Law.

[65]  Juan L. Gandía,et al.  Determinants of internet-based corporate governance disclosure by Spanish listed companies , 2008, Online Inf. Rev..

[66]  Guttorm Sindre,et al.  A Survey of Development Methods for Semantic Web Service Systems , 2009, Int. J. Inf. Syst. Serv. Sect..

[67]  Georgios Paliouras,et al.  Ontology Population and Enrichment: State of the Art , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[68]  Brian McBride,et al.  Jena: A Semantic Web Toolkit , 2002, IEEE Internet Comput..

[69]  Mark A. Musen,et al.  The PROMPT suite: interactive tools for ontology merging and mapping , 2003, Int. J. Hum. Comput. Stud..

[70]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[71]  Kai Gao,et al.  The cooperation model for multi-agents and the identification on replicated collections for web crawler , 2010, Int. J. Model. Identif. Control..