Using Distributional Semantics for Automatic Taxonomy Induction

Semantic taxonomies are powerful tools that provide structured knowledge to Natural Language Processing (NLP), Information Retreval (IR), and general Artificial Intelligence (AI) systems. These taxonomies are extensively used for solving knowledge rich problems such as textual entailment and question answering. In this paper, we present a taxonomy induction system and evaluate it using the benchmarks provided in the Taxonomy Extraction Evaluation (TExEval2) Task. The task is to identify hyponym-hypernym relations and to construct a taxonomy from a given domain specific list. Our approach is based on a word embedding, trained from a large corpus and string-matching approaches. The overall approach is semi-supervised. We propose a generic algorithm that utilizes the vectors from the embedding effectively, to identify hyponym-hypernym relations and to induce the taxonomy. The system generated taxonomies on English language for three different domains (environment, food and science) which are evaluated against gold standard taxonomies. The system achieved good results for hyponym-hypernym identification and taxonomy induction, especially when compared to other tools using similar background knowledge.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Grace Hui Yang,et al.  Learning the distance metric in a personal ontology , 2008, ONISW '08.

[3]  Alessandro Lenci,et al.  Identifying hypernyms in distributional semantic spaces , 2012, *SEMEVAL.

[4]  Grace Hui Yang,et al.  A Metric-based Framework for Automatic Taxonomy Induction , 2009, ACL.

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[7]  Stefano Faralli,et al.  TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling , 2016, *SEMEVAL.

[8]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[9]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[10]  Laks V. S. Lakshmanan,et al.  Proceedings of the 2008 ACM SIGMOD international conference on Management of data , 2008, SIGMOD 2008.

[11]  Zornitsa Kozareva,et al.  A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web , 2010, EMNLP.

[12]  Noam Shazeer,et al.  Swivel: Improving Embeddings by Noticing What's Missing , 2016, ArXiv.

[13]  Ido Dagan,et al.  The Distributional Inclusion Hypotheses and Lexical Entailment , 2005, ACL.

[14]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[15]  Sanda M. Harabagiu,et al.  Open-domain textual question answering techniques , 2003, Natural Language Engineering.

[16]  Qin Lu,et al.  Chasing Hypernyms in Vector Spaces with Entropy , 2014, EACL.

[17]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[18]  Madian Khabsa,et al.  Graph-based Approach to Automatic Taxonomy Generation (GraBTax) , 2013, ArXiv.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Lonneke van der Plas,et al.  Automatic Acquisition of Lexico-semantic Knowledge for QA , 2005, IJCNLP.

[21]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  Ellen Riloff,et al.  Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs , 2008, ACL.

[24]  Guillaume Cleuziou,et al.  QASSIT at SemEval-2016 Task 13: On the integration of Semantic Vectors in Pretopological Spaces for Lexical Taxonomy Acquisition , 2016, SemEval@NAACL-HLT.

[25]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[26]  Ted Briscoe,et al.  Looking for Hyponyms in Vector Space , 2014, CoNLL.

[27]  Els Lefever,et al.  LT3: A Multi-modular Approach to Automatic Taxonomy Construction , 2015, *SEMEVAL.

[28]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[29]  Josef van Genabith,et al.  USAAR at SemEval-2016 Task 13: Hyponym Endocentricity , 2016, *SEMEVAL.

[30]  Joel Pocostales NUIG-UNLP at SemEval-2016 Task 13: A Simple Word Embedding-based Approach for Taxonomy Extraction , 2016, SemEval@NAACL-HLT.

[31]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[32]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[33]  Dipankar Das,et al.  JUNLP at SemEval-2016 Task 13: A Language Independent Approach for Hypernym Identification , 2016, SemEval@NAACL-HLT.

[34]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[35]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[36]  Karen Spärck Jones Experiments in relevance weighting of search terms , 1979, Inf. Process. Manag..

[37]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[38]  Stefano Faralli,et al.  A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch , 2011, IJCAI.

[39]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[40]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[41]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[42]  Gideon S. Mann Fine-Grained Proper Noun Ontologies for Question Answering , 2002, COLING 2002.

[43]  Liling Tan,et al.  USAAR-CHRONOS: Crawling the Web for Temporal Annotations , 2015, *SEMEVAL.

[44]  James L. McClelland,et al.  Learning hierarchical category structure in deep neural networks , 2013 .

[45]  Adam Pease,et al.  The Suggested Upper Merged Ontology: A Large Ontology for the Semantic Web and its Applic ations , 2002 .