Ontology mediated information extraction in financial domain with Mastro System-T

Information extraction (IE) refers to the task of turning text documents into a structured form, in order to make the information contained therein automatically processable. Ontology Mediated Information Extraction (OMIE) is a new paradigm for IE that seeks to exploit the semantic knowledge expressed in ontologies to improve query answering over unstructured data (properly raw text). In this paper we present Mastro System-T, an OMIE tool born from a joint collaboration between the University of Rome "La Sapienza" and IBM Research Almaden and its first application in a financial domain, namely to facilitate the access to and the sharing of data extracted from the EDGAR system.

[1]  Atanas Kiryakov,et al.  Towards Semantic Web Information Extraction , 2003 .

[2]  Diego Calvanese,et al.  Ontology-Based Data Access: A Survey , 2018, IJCAI.

[3]  Domenico Lembo,et al.  Easy OWL Drawing with the Graphol Visual Ontology Language , 2016, KR.

[4]  Frederick Reiss,et al.  An Algebraic Approach to Rule-Based Information Extraction , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Kim Schouten,et al.  Semantics-based information extraction for detecting economic events , 2012, Multimedia Tools and Applications.

[6]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[7]  Diego Calvanese,et al.  Linking Data to Ontologies , 2008, J. Data Semant..

[8]  B. Hammond Ontology , 2004, Lawrence Booth’s Book of Visions.

[9]  Massimo Mecella,et al.  Ontology population for open‐source intelligence: A GATE‐based solution , 2018, Softw. Pract. Exp..

[10]  Domenico Lembo,et al.  A Formal Framework for Coupling Document Spanners with Ontologies , 2019, 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE).

[11]  RONALD FAGIN,et al.  Document Spanners , 2015, J. ACM.

[12]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[13]  Frederick Reiss,et al.  SystemT: a system for declarative information extraction , 2009, SGMD.

[14]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[15]  Domenico Lembo,et al.  Ontology-based Document Spanning Systems for Information Extraction , 2020, Int. J. Semantic Comput..

[16]  Kalina Bontcheva,et al.  Ontology-Based Information Extraction for Business Intelligence , 2007, ISWC/ASWC.

[17]  Maurizio Lenzerini,et al.  MASTRO: A Reasoner for Effective Ontology-Based Data Access , 2012, ORE.

[18]  Boris Motik,et al.  OWL 2 Web Ontology Language: structural specification and functional-style syntax , 2008 .

[19]  Frederick Reiss,et al.  SystemT: An Algebraic Approach to Declarative Information Extraction , 2010, ACL.

[20]  Deborah L. McGuinness,et al.  OWL Web Ontology Language Guide - W3C Working Draft , 2003 .

[21]  Domenico Lembo,et al.  Drawing OWL 2 ontologies with Eddy the editor , 2018, AI Commun..

[22]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.