CrumbTrail: An efficient methodology to reduce multiple inheritance in knowledge graphs

Abstract In this paper we present CrumbTrail , an algorithm to clean large and dense knowledge graphs. CrumbTrail removes cycles, out-of-domain nodes and non-essential nodes, i.e., those that can be safely removed without breaking the knowledge graph’s connectivity. It achieves this through a bottom-up topological pruning on the basis of a set of input concepts that, for instance, a user can select in order to identify a domain of interest. Our technique can be applied to both noisy hypernymy graphs – typically generated by ontology learning algorithms as intermediate representations – as well as crowdsourced resources like Wikipedia, in order to obtain clean, domain-focused concept hierarchies. CrumbTrail overcomes the time and space complexity limitations of current state-of-art algorithms. In addition, we show in a variety of experiments that it also outperforms them in tasks such as pruning automatically acquired taxonomy graphs, and domain adaptation of the Wikipedia category graph.

[1]  Carlos Angel Iglesias,et al.  Sematch: Semantic similarity framework for Knowledge Graphs , 2017, Knowl. Based Syst..

[2]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[3]  David Bullock,et al.  NSM + LDOCE: A Non-Circular Dictionary of English , 2011 .

[4]  Rahul Gupta,et al.  Biperpedia: An Ontology for Search Applications , 2014, Proc. VLDB Endow..

[5]  Bruce G. Buchanan,et al.  What Do We Know about Knowledge? , 2006, AI Mag..

[6]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[7]  Alfred D. Chandler Strategy and Structure: Chapters in the History of the Industrial Enterprise , 1962 .

[8]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[9]  James F. Allen,et al.  What's in a Semantic Network? , 1982, ACL.

[10]  Tiziano Flati,et al.  MultiWiBi: The multilingual Wikipedia bitaxonomy project , 2016, Artif. Intell..

[11]  Heiko Paulheim,et al.  Semantic Web in data mining and knowledge discovery: A comprehensive survey , 2016, J. Web Semant..

[12]  Amit Gupta,et al.  Revisiting Taxonomy Induction over Wikipedia , 2016, COLING.

[13]  Camil Demetrescu,et al.  Combinatorial algorithms for feedback problems in directed graphs , 2003, Inf. Process. Lett..

[14]  Sudipto Guha,et al.  Approximation algorithms for directed Steiner problems , 1999, SODA '98.

[15]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[16]  Fred S. Roberts,et al.  Applied Combinatorics , 1984 .

[17]  Bonnie Berger,et al.  Approximation alogorithms for the maximum acyclic subgraph problem , 1990, SODA '90.

[18]  Camil Demetrescu,et al.  Breaking cycles for minimizing crossings , 2001, JEAL.

[19]  Roberto Navigli,et al.  Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities , 2016, Artif. Intell..

[20]  Srinivasan Parthasarathy,et al.  Breaking Cycles In Noisy Hierarchies , 2017, WebSci.

[21]  Luis Gravano,et al.  Extracting Relations from Large Plain-Text Collections , 1999 .

[22]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[23]  Andrei Z. Broder,et al.  Classifying search queries using the Web as a source of knowledge , 2009, TWEB.

[24]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[25]  George A. Miller,et al.  Squibs and Discussions: WordNet Nouns: Classes and Instances , 2006, CL.

[26]  Tom Richens Anomalies in the WordNet Verb Hierarchy , 2008, COLING.

[27]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[28]  Paolo Rosso,et al.  Cross-domain polarity classification using a knowledge-enhanced meta-classifier , 2015, Knowl. Based Syst..

[29]  Gerhard Weikum,et al.  Taxonomic data integration from multilingual Wikipedia editions , 2012, Knowledge and Information Systems.

[30]  Simone Paolo Ponzetto,et al.  Taxonomy induction based on a collaboratively built knowledge repository , 2011, Artif. Intell..

[31]  Kevin Knight,et al.  Toward Distributed Use of Large-Scale Ontologies t , 1997 .

[32]  Yi Chang,et al.  Streaming Recommender Systems , 2016, WWW.

[33]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[34]  Stefano Faralli,et al.  Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl , 2017, LREC.

[35]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[36]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[37]  Bruce G. Buchanan,et al.  A (Very) Brief History of Artificial Intelligence , 2005, AI Mag..

[38]  Wolfram Wöß,et al.  Towards a Definition of Knowledge Graphs , 2016, SEMANTiCS.

[39]  Marta Sabou,et al.  Dynamic Integration of Multiple Evidence Sources for Ontology Learning , 2012, J. Inf. Data Manag..

[40]  Stefano Faralli,et al.  Recommendation of microblog users based on hierarchical interest profiles , 2015, Social Network Analysis and Mining.

[41]  Dragomir R. Radev,et al.  A survey of graphs in natural language processing* , 2015, Natural Language Engineering.

[42]  Simone Paolo Ponzetto,et al.  Collaboratively built semi-structured content and Artificial Intelligence: The story so far , 2013, Artif. Intell..

[43]  Paul Buitelaar,et al.  SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval) , 2015, SemEval@NAACL-HLT.

[44]  Zornitsa Kozareva,et al.  A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web , 2010, EMNLP.

[45]  Simone Paolo Ponzetto,et al.  Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution , 2006, NAACL.

[46]  John H. Reif,et al.  Depth-First Search is Inherently Sequential , 1985, Inf. Process. Lett..

[47]  Erik Cambria,et al.  Sentic patterns: Dependency-based rules for concept-level sentiment analysis , 2014, Knowl. Based Syst..

[48]  Stefano Faralli,et al.  Large Scale Homophily Analysis in Twitter Using a Twixonomy , 2015, IJCAI.

[49]  Wei Gao,et al.  Distance learning techniques for ontology similarity measuring and ontology mapping , 2017, Cluster Computing.

[50]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[51]  Steffen Staab,et al.  What Is an Ontology? , 2009, Handbook on Ontologies.

[52]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[53]  Badrish Chandramouli,et al.  StreamRec: a real-time recommender system , 2011, SIGMOD '11.

[54]  Wlodzislaw Duch,et al.  Multiple Inheritance Problem in Semantic Spreading Activation Networks , 2014, Brain Informatics and Health.

[55]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[56]  Amit P. Sheth,et al.  User Interests Identification on Twitter Using a Hierarchical Knowledge Base , 2014, ESWC.

[57]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[58]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[59]  Ziqi Zhang,et al.  Recent advances in methods of lexical semantic relatedness – a survey , 2012, Natural Language Engineering.

[60]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[61]  Jordi Conesa,et al.  Pruning Bio-Ontologies , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[62]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[63]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[64]  Ahmed Elgohary,et al.  Wiki-rec: A semantic-based recommendation system using Wikipedia as an ontology , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[65]  Mitsuhiko Toda,et al.  Methods for Visual Understanding of Hierarchical System Structures , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[66]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[67]  P. Simons Parts: A Study in Ontology , 1991 .

[68]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[69]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[70]  Mohammed Bennamoun,et al.  Ontology learning from text: A look back and into the future , 2012, CSUR.

[71]  Deepak Agarwal,et al.  Fast online learning through offline initialization for time-sensitive recommendation , 2010, KDD.

[72]  Yanxiang Huang,et al.  TencentRec: Real-time Stream Recommendation in Practice , 2015, SIGMOD Conference.

[73]  Stefano Ceri,et al.  A bottom-up, knowledge-aware approach to integrating and querying web data services , 2013, TWEB.

[74]  Paola Velardi,et al.  Learning Word-Class Lattices for Definition and Hypernym Extraction , 2010, ACL.

[75]  Mirella Lapata,et al.  Taxonomy Induction Using Hierarchical Random Graphs , 2012, NAACL.

[76]  Gerhard Weikum,et al.  The Knowledge Awakens: Keeping Knowledge Bases Fresh with Emerging Entities , 2016, WWW.

[77]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[78]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[79]  Parth Gupta,et al.  Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language , 2016, Knowl. Based Syst..

[80]  Björn W. Schuller,et al.  New avenues in knowledge bases for natural language processing , 2016, Knowl. Based Syst..

[81]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[82]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[83]  Oren Etzioni,et al.  Adapting Open Information Extraction to Domain-Specific Relations , 2010, AI Mag..