Unsupervised Terminological Ontology Learning Based on Hierarchical Topic Modeling

In this paper, we present hierarchical relationbased latent Dirichlet allocation (hrLDA), a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents. In contrast to traditional topic models, hrLDA relies on noun phrases instead of unigrams, considers syntax and document structures, and enriches topic hierarchies with topic relations. Through a series of experiments, we demonstrate the superiority of hrLDA over existing topic models, especially for building hierarchies. Furthermore, we illustrate the robustness of hrLDA in the settings of noisy data sets, which are likely to occur in many practical scenarios. Our ontology evaluation results show that ontologies extracted from hrLDA are very competitive with the ontologies created by domain experts.

[1]  Xiaohua Hu,et al.  Tree Labeled LDA: A Hierarchical model for web summaries , 2013, 2013 IEEE International Conference on Big Data.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Peter I. Frazier,et al.  Distance dependent Chinese restaurant processes , 2009, ICML.

[4]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[5]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[6]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[7]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Eric P. Xing,et al.  Grounding Topic Models with Knowledge Bases , 2016, IJCAI.

[9]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[10]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[11]  Isidra Ocampo-Guzman,et al.  Data-driven approach for ontology learning , 2009, 2009 6th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE).

[12]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[13]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[14]  Sotirios Chatzis Inducing Space Dirichlet Process Mixture Large-Margin Entity RelationshipInference in Knowledge Bases , 2015, CIKM.

[15]  Lizhen Qu,et al.  Neighborhood Mixture Model for Knowledge Base Completion , 2016, CoNLL.

[16]  Doug Downey,et al.  Using natural language to integrate, evaluate, and optimize extracted knowledge bases , 2013, AKBC '13.

[17]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[18]  Hans Uszkoreit,et al.  Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web , 2012, International Semantic Web Conference.

[19]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[20]  Andrzej Bargiela,et al.  Probabilistic Topic Models for Learning Terminological Ontologies , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21]  Zhao Xiaodong,et al.  An Ontology Term Extracting Method Based on Latent Dirichlet Allocation , 2012, 2012 Fourth International Conference on Multimedia Information Networking and Security.

[22]  Yelena Yesha,et al.  A Methodology for Ontology Evaluation Using Topic Models , 2012, 2012 Fourth International Conference on Intelligent Networking and Collaborative Systems.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Flora Amato,et al.  Terminological ontology learning and population using latent Dirichlet allocation , 2014, J. Vis. Lang. Comput..

[25]  Zhenyu Qi,et al.  Large-scale Knowledge Base Completion: Inferring via Grounding Network Sampling over Selected Instances , 2015, CIKM.

[26]  Chong Wang,et al.  Nested Hierarchical Dirichlet Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[28]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[29]  Alexander J. Smola,et al.  Nested Chinese Restaurant Franchise Process: Applications to User Tracking and Document Modeling , 2013, ICML.

[30]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[31]  Subhabrata Mukherjee,et al.  Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus , 2014, CIKM.

[32]  William W. Cohen,et al.  KB-LDA: Jointly Learning a Knowledge Base of Hierarchy, Relations, and Facts , 2015, ACL.