Automated Ontology Extraction from Unstructured Texts using Deep Learning

Ontologies are computational artifacts to represent knowledge through classes and relations between them. Those knowledge bases require a lot human effort to be constructed due to the need of domain experts and knowledge engineers. Ontology Learning aims to automatically build ontologies from data that can be from multimedia, web pages, databases, unstructured text, etc. In this work, we propose a methodology to automatically build an ontology to represent concepts map of subjects to be used in academic context. The main contribution of this methodology is that does not require handcrafted features by using Deep Learning techniques to identify taxonomic and semantic relations between concepts of some specific domain. Also, due the implementation of transfer learning is not needed of specific domain dataset, the relation classification model is trained with Wikipedia and WordNet by distant supervision technique and the knowledge is transferred to a specific domain by word embedding techniques. The results of this approach are promising considering the lack of human intervention and feature engineering.

[1]  Qiang Yang,et al.  Transfer Learning for Text Mining , 2012, Mining Text Data.

[2]  Jacqueline Bourdeau,et al.  Using Ontological Engineering to Overcome Common AI-ED Problems , 2000 .

[3]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[4]  Ana Arruarte Lasa,et al.  Automatic Generation of the Domain Module from Electronic Textbooks: Method and Validation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Han Zhao,et al.  Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[6]  Suresh Manandhar,et al.  Dependency Based Embeddings for Sentence Classification Tasks , 2016, NAACL.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Bowen Zhou,et al.  Classifying Relations by Ranking with Convolutional Neural Networks , 2015, ACL.

[9]  Sergio Inzunza,et al.  An Ontology of the Object Orientation for Intelligent Tutoring Systems , 2017, 2017 5th International Conference in Software Engineering Research and Innovation (CONISOFT).

[10]  Donghong Ji,et al.  Exploiting flexible-constrained K-means clustering with word embedding for aspect-phrase grouping , 2016, Inf. Sci..

[11]  Sridhar Iyer,et al.  Automated Building of Domain Ontologies from Lecture Notes in Courseware , 2011, 2011 IEEE International Conference on Technology for Education.

[12]  Maja Gligora Marković,et al.  A Prevalence Trend of Characteristics of Intelligent and Adaptive Hypermedia E-Learning Systems , 2015 .

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[15]  Preslav Nakov,et al.  SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals , 2009, SEW@NAACL-HLT.

[16]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[17]  Miao Fan,et al.  Probabilistic Belief Embedding for Large-Scale Knowledge Population , 2016, Cognitive Computation.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Reyes Juárez-Ramírez,et al.  Semantic Capture Analysis in Word Embedding Vectors Using Convolutional Neural Network , 2017, WorldCIST.

[20]  Nickolas J. G. Falkner,et al.  Automated Extraction of Semantic Concepts from Semi-structured Data: Supporting Computer-Based Education through the Analysis of Lecture Notes , 2012, DEXA.

[21]  Flora Amato,et al.  Terminological ontology learning and population using latent Dirichlet allocation , 2014, J. Vis. Lang. Comput..

[22]  Víctor Jesús Sosa Sosa,et al.  Learning concept hierarchies from textual resources for ontologies construction , 2013, Expert Syst. Appl..

[23]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[24]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[25]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[26]  Jun Guo,et al.  An empirical convolutional neural network approach for semantic relation classification , 2016, Neurocomputing.