Towards a Universal Taxonomy of Many Concepts

Knowledge is indispensable to understanding. The ongoing information explosion highlights the need to enable machines to better understand electronic text in natural human language. The challenge thus lies in how to transfer human knowledge to machines. Much work has been devoted to creating universal ontologies for this purpose. However, none of the existing ontologies has the necessary depth and breadth to offer “universal understanding.” In this paper, we present a universal, probabilistic ontology that is more comprehensive than any of the existing ontologies. Currently, it contains 2.7 million concepts harnessed automatically from a corpus of 1.68 billion web pages and two years’ worth of search log data. Unlike traditional knowledge bases that treat knowledge as black and white, it enables probabilistic interpretations of the information it contains. The probabilistic nature then enables it to incorporate heterogeneous information in a natural way. We present details of how the core ontology is constructed, and how it models knowledge’s inherent uncertainty, ambiguity, and inconsistency. We also discuss potential applications, e.g., understanding user intent, that can benefit from the taxonomy.

[1]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[2]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[3]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[4]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Daphne Koller,et al.  Probabilistic Abstraction Hierarchies , 2001, NIPS.

[7]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[8]  Günter Neumann,et al.  An Integrated Archictecture for Shallow and Deep Processing , 2002, ACL.

[9]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[10]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Dell Zhang,et al.  Web taxonomy integration using support vector machines , 2004, WWW '04.

[13]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[14]  Marius Pasca,et al.  Acquisition of categorized named entities for web search , 2004, CIKM '04.

[15]  Doug Downey,et al.  A Probabilistic Model of Redundancy in Information Extraction , 2005, IJCAI.

[16]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[17]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[18]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[19]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[20]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21]  Marius Pasca,et al.  Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds , 2007, WWW '07.

[22]  Marius Pasca,et al.  Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction , 2008, AAAI.

[23]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[24]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[25]  Simone Paolo Ponzetto,et al.  Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia , 2009, IJCAI.

[26]  Oren Etzioni,et al.  What Is This, Anyway: Automatic Hypernym Discovery , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[27]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.