Ontology population for open‐source intelligence: A GATE‐based solution

Open‐Source INTelligence is intelligence based on publicly available sources such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but because of the vast amount of documents available, automatic mechanisms for their population are needed, starting from the crawled text. This paper presents an approach for the automatic population of predefined ontologies with data extracted from text and discusses the design and realization of a pipeline based on the General Architecture for Text Engineering system, which is interesting for both researchers and practitioners in the field. Some experimental results that are encouraging in terms of extracted correct instances of the ontology are also reported. Furthermore, the paper also describes an alternative approach and provides additional experiments for one of the phases of our pipeline, which requires the use of predefined dictionaries for relevant entities. Through such a variant, the manual workload required in this phase was reduced, still obtaining promising results.

[1]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[2]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.

[3]  Diana Maynard,et al.  NLP Techniques for Term Extraction and Ontology Population , 2008, Ontology Learning and Population.

[4]  N. Guarino,et al.  Formal Ontology in Information Systems : Proceedings of the First International Conference(FOIS'98), June 6-8, Trento, Italy , 1998 .

[5]  B. Motik,et al.  RDFox: A Highly-Scalable RDF Store , 2015, SEMWEB.

[6]  Riccardo Rosati,et al.  Improving Query Answering over DL-Lite Ontologies , 2010, KR.

[7]  Georgios Paliouras,et al.  Ontology Population and Enrichment: State of the Art , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[8]  Thierry Poibeau,et al.  Multi-source, Multilingual Information Extraction and Summarization , 2012, Theory and Applications of Natural Language Processing.

[9]  Carola Eschenbach,et al.  Formal Ontology in Information Systems , 2008 .

[10]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[11]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[12]  Boris Motik,et al.  Efficient Query Answering for OWL 2 , 2009, SEMWEB.

[13]  Maurizio Lenzerini,et al.  MASTRO STUDIO: Managing Ontology-Based Data Access applications , 2013, Proc. VLDB Endow..

[14]  R GruberThomas Toward principles for the design of ontologies used for knowledge sharing , 1995 .

[15]  Giorgos Stoilos,et al.  Query Extensions and Incremental Query Rewriting for OWL 2 QL Ontologies , 2014, Journal on Data Semantics.

[16]  Hai Zhao,et al.  Integrative Semantic Dependency Parsing via Efficient Large-scale Feature Selection , 2013, J. Artif. Intell. Res..

[17]  Evgeny Kharlamov,et al.  Ontology Based Data Access in Statoil , 2017, J. Web Semant..

[18]  Georgios Paliouras,et al.  Knowledge-Driven Multimedia Information Extraction and Ontology Evolution - Bridging the Semantic Gap , 2011, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution.

[19]  Mark Johnson,et al.  Mathematical Foundations of Speech and Language Processing , 2004 .

[20]  Domenico Lembo,et al.  Easy OWL Drawing with the Graphol Visual Ontology Language , 2016, KR.

[21]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[22]  Kalina Bontcheva,et al.  Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics , 2013, PLoS Comput. Biol..

[23]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[24]  Diego Calvanese,et al.  Ontop: Answering SPARQL queries over relational databases , 2016, Semantic Web.

[25]  Domenico Lembo,et al.  Drawing OWL 2 ontologies with Eddy the editor , 2018, AI Commun..

[26]  Diana Maynard,et al.  Metrics for Evaluation of Ontology-based Information Extraction , 2006, EON@WWW.

[27]  Donato Summa,et al.  Using Internet as a Data Source for Official Statistics : a Comparative Analysis of Web Scraping Technologies , 2015 .

[28]  James H. Martin,et al.  Introduction to Natural Language Processing , 2019, Hands-on Question Answering Systems with BERT.

[29]  René Witte,et al.  Flexible Ontology Population from Text: The OwlExporter , 2010, LREC.

[30]  Steffen Staab,et al.  What Is an Ontology? , 2009, Handbook on Ontologies.

[31]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[32]  Maurizio Lenzerini,et al.  Developing Ontology-based Data Management for the Italian Public Debt , 2014, SEBD.

[33]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[34]  Boris Motik,et al.  HermiT: An OWL 2 Reasoner , 2014, Journal of Automated Reasoning.

[35]  H. Cunningham,et al.  Developing Language Processing Components with GATE , 2001 .