From hyperlinks to Semantic Web properties using Open Knowledge Extraction

Open information extraction approaches are useful but insufficient alone for populating the Web with machine read- able information as their results are not directly linkable to, and immediately reusable from, other Linked Data sources. This work proposes a novel paradigm, named Open Knowledge Extraction, and its implementation (Legalo) that performs unsuper- vised, open domain, and abstractive knowledge extraction from text for producing machine readable information. The imple- mented method is based on the hypothesis that hyperlinks (either created by humans or knowledge extraction tools) provide a pragmatic trace of semantic relations between two entities, and that such semantic relations, their subjects and objects, can be revealed by processing their linguistic traces (i.e. the sentences that embed the hyperlinks) and formalised as Semantic Web triples and ontology axioms. Experimental evaluations conducted on validated text extracted from Wikipedia pages, with the help of crowdsourcing, confirm this hypothesis showing high performances. A demo is available at http://wit.istc.cnr.it/stlab-tools/ legalo.

[1]  Aldo Gangemi,et al.  Knowledge Extraction Based on Discourse Representation Theory and Linguistic Frames , 2012, EKAW.

[2]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[3]  Aldo Gangemi,et al.  Ontology Design Patterns , 2005 .

[4]  Tiziano Flati,et al.  Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project , 2014, ACL.

[5]  Andrea Giovanni Nuzzolese,et al.  Encyclopedic Knowledge Patterns from Wikipedia Links , 2011, SEMWEB.

[6]  Enrico Motta,et al.  Scaling Up Question-Answering to Linked Data , 2010, EKAW.

[7]  Hans Uszkoreit,et al.  From Strings to Things SAR-Graphs: A New Type of Resource for Connecting Knowledge and Language , 2013, NLP-DBPEDIA@ISWC.

[8]  Aldo Gangemi,et al.  A Comparison of Knowledge Extraction Tools for the Semantic Web , 2013, ESWC.

[9]  Isabelle Augenstein,et al.  Relation Extraction from the Web Using Distant Supervision , 2014, EKAW.

[10]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[11]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[12]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[13]  Jérôme Euzenat,et al.  A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness , 2010, SEMWEB.

[14]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[15]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[16]  Enrico Motta,et al.  Collaborative Semantic Authoring , 2008, IEEE Intelligent Systems.

[17]  Roberto Navigli,et al.  Integrating Syntactic and Semantic Analysis into the Open Information Extraction Paradigm , 2013, IJCAI.

[18]  Miriam R. L. Petruck FRAME SEMANTICS , 1996 .

[19]  Push Singh,et al.  The Public Acquisition of Commonsense Knowledge , 2002 .

[20]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[21]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[22]  Lora Aroyo,et al.  Extracting Core Knowledge from Linked Data , 2011, COLD.

[23]  Patrick Lambrix,et al.  Knowledge Engineering and Knowledge Management: 19th International Conference, EKAW 2014 , 2014 .

[24]  Fabio Vitali,et al.  Dealing with markup semantics , 2011, I-Semantics '11.

[25]  Andrea Giovanni Nuzzolese,et al.  Gathering lexical linked data and knowledge patterns from FrameNet , 2011, K-CAP '11.

[26]  Roberto Navigli,et al.  Validating and Extending Semantic Knowledge Bases using Video Games with a Purpose , 2014, ACL.

[27]  Robert Isele,et al.  Active learning of expressive linkage rules using genetic programming , 2013, J. Web Semant..

[28]  Xiang Zhang,et al.  Ontology summarization based on rdf sentence graph , 2007, WWW '07.

[29]  Diego Reforgiato Recupero,et al.  Uncovering the Semantics of Wikipedia Pagelinks , 2014, EKAW.

[30]  Lora Aroyo,et al.  Crowdsourcing knowledge-intensive tasks in cultural heritage , 2014, WebSci '14.

[31]  Jérôme Euzenat,et al.  Ontology Matching, Second Edition , 2013 .

[32]  Claudio Giuliano,et al.  Outsourcing FrameNet to the Crowd , 2013, ACL.

[33]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[34]  Oren Etzioni,et al.  Machine Reading , 2006, AAAI.

[35]  Aldo Gangemi,et al.  Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology , 2005, IJCAI.

[36]  Catherine Havasi,et al.  ConceptNet: A lexical resource for common sense knowledge , 2009 .

[37]  Giovanni Tummarello,et al.  Introducing RDF Graph Summary with Application to Assisted SPARQL Formulation , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[38]  H. Kamp A Theory of Truth and Semantic Representation , 2008 .

[39]  Enrico Motta,et al.  Toward a New Generation of Semantic Web Applications , 2008, IEEE Intelligent Systems.

[40]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[41]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[42]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[43]  Kristina Toutanova,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, Annual Meeting of the Association for Computational Linguistics.

[44]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[45]  Raphaël Troncy,et al.  NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud , 2012, LDOW.

[46]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[47]  Johan Bos,et al.  Wide-Coverage Semantic Analysis with Boxer , 2008, STEP.

[48]  Bernard Comrie,et al.  Language Universals and Linguistic Typology: Syntax and Morphology , 1981 .

[49]  Dragos Stefan Munteanu,et al.  ParaEval: Using Paraphrases to Evaluate Summaries Automatically , 2006, NAACL.

[50]  Ali Khalili,et al.  conTEXT - Lightweight Text Analytics Using Linked Data , 2014, ESWC.

[51]  Dilek Z. Hakkani-Tür,et al.  Open-Domain Multi-Document Summarization via Information Extraction: Challenges and Prospects , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[52]  Andrea Giovanni Nuzzolese,et al.  Automatic Typing of DBpedia Entities , 2012, SEMWEB.

[53]  Lenhart K. Schubert,et al.  Open Knowledge Extraction through Compositional Language Processing , 2008, STEP.

[54]  Philipp Cimiano,et al.  Knowledge Engineering and Management by the Masses , 2010, Lecture Notes in Computer Science.