Domain Ontology Induction Using Word Embeddings

Ontology, the shared formal conceptualization of domain information, has been shown to have multiple applications in modeling, processing and understanding natural language text. In this work, we use distributed word vectors out of various recent language models from Deep Learning for semi-automated domain ontology creation for closed domains. We cover all major aspects of Domain Ontology Induction or Learning like concept identification, attribute identification, taxonomical and non-taxonomical relationship identification using the distributed word vectors. Preliminary results show that simple clustering based methods using distributed word vectors from these language models outperforms methods using models like LSI in ontology learning for closed domains.

[1]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[2]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[3]  Michael Healy,et al.  Theory and Applications of Ontology: Computer Applications , 2010 .

[4]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[5]  Christian Biemann,et al.  Ontology Learning from Text: A Survey of Methods , 2005, LDV Forum.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[8]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[9]  Philipp Cimiano,et al.  Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[10]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[11]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[12]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[13]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[14]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[15]  Steffen Staab,et al.  Ontology Learning , 2004, Encyclopedia of Machine Learning and Data Mining.

[16]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[17]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[18]  Mohammed Bennamoun,et al.  Ontology learning from text: A look back and into the future , 2012, CSUR.

[19]  Partha P. Talukdar,et al.  Relation Schema Induction using Tensor Factorization with Side Information , 2016, EMNLP.

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  S. Dumais Latent Semantic Analysis. , 2005 .

[24]  George A. Vouros,et al.  Gold Standard Evaluation of Ontology Learning Methods through Ontology Transformation and Alignment , 2011, IEEE Transactions on Knowledge and Data Engineering.

[25]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[26]  Nizar Grira,et al.  Unsupervised and Semi-supervised Clustering : a Brief Survey ∗ , 2004 .