Word Vector Embeddings and Domain Specific Semantic based Semi-Supervised Ontology Instance Population

An ontology defines a set of representational primitives which model a domain of knowledge or discourse. With the arising fields such as information extraction and knowledge management, the role of ontology has become a driving factor of many modern day systems. Ontology population, on the other hand, is an inherently problematic process, as it needs manual intervention to prevent the conceptual drift. The semantic sensitive word embedding has become a popular topic in natural language processing with its capability to cope with the semantic challenges. Incorporating domain specific semantic similarity with the word embeddings could potentially improve the performance in terms of semantic similarity in specific domains. Thus, in this study, we propose a novel way of semi-supervised ontology population through word embeddings and domain specific semantic similarity as the basis. We built several models including traditional benchmark models and new types of models which are based on word embeddings. Finally, we ensemble them together to come up with a synergistic model which outperformed the candidate models by 33% in comparison to the best performed candidate model.

[1]  Dejing Dou,et al.  Discovering Inconsistencies in PubMed Abstracts through Ontology-Based Information Extraction , 2017, BCB.

[2]  Yun Zhu,et al.  Support vector machines and Word2vec for text classification with semantic features , 2015, 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[3]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[4]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[5]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[6]  Xiaowei Wang,et al.  OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction data , 2016, J. Biomed. Semant..

[7]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[8]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[9]  Nisansa de Silva,et al.  Subject Specific Stream Classification Preprocessing Algorithm for Twitter Data Stream , 2017, ArXiv.

[10]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[11]  Lakshman Jayaratne,et al.  Semi-automatic extraction and modeling of ontologies using Wikipedia XML Corpus , 2009 .

[12]  Nicola Guarino,et al.  Formal Ontology and Information Systems , 1998 .

[13]  Wim Peters,et al.  SPRAT : a tool for automatic semantic pattern-based ontology population , 2009 .

[14]  Jie Liu,et al.  An ontology mapping method based on support vector machine , 2013, OM.

[15]  Rosario Girardi,et al.  A domain-independent process for automatic ontology population from text , 2014, Sci. Comput. Program..

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[18]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[19]  W. Marsden I and J , 2012 .

[20]  Xiaowei Wang,et al.  The development of non-coding RNA ontology , 2016, Int. J. Data Min. Bioinform..

[21]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[22]  N. H. N. D. de Silva,et al.  Semi-supervised algorithm for concept ontology based word set expansion , 2013, 2013 International Conference on Advances in ICT for Emerging Regions (ICTer).

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Hua Xu,et al.  Chinese comments sentiment classification based on word2vec and SVMperf , 2015, Expert Syst. Appl..

[25]  Keet Sugathadasa,et al.  Semi-supervised instance population of an ontology using word vector embedding , 2017, 2017 Seventeenth International Conference on Advances in ICT for Emerging Regions (ICTer).

[26]  Gerhard Wohlgenannt,et al.  Using word2vec to Build a Simple Ontology Learning System , 2016, SEMWEB.

[27]  Lakshman Jayaratne,et al.  Semi-automatic extraction and modeling of ontologies using Wikipedia XML Corpus , 2009, 2009 Second International Conference on the Applications of Digital Information and Web Technologies.

[28]  Keet Sugathadasa,et al.  Deriving a representative vector for ontology classes with instance word vector embeddings , 2017, 2017 Seventh International Conference on Innovative Computing Technology (INTECH).

[29]  Haixia Liu,et al.  Sentiment Analysis of Citations Using Word2vec , 2017, ArXiv.

[30]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31]  N. H. N. D. de Silva SAFS3 algorithm: Frequency statistic and semantic similarity based semantic classification use case , 2015, 2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer).

[32]  Gihan Dias,et al.  Building a WordNet for Sinhala , 2014, GWC.

[33]  Keet Sugathadasa,et al.  Synergistic union of Word2Vec and lexicon for domain specific semantic similarity , 2017, 2017 IEEE International Conference on Industrial and Information Systems (ICIIS).

[34]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..

[35]  René Witte,et al.  Flexible Ontology Population from Text: The OwlExporter , 2010, LREC.

[36]  George A. Miller,et al.  Nouns in WordNet: A Lexical Inheritance System , 1990 .

[37]  R GruberThomas A translation approach to portable ontology specifications , 1993 .

[38]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[39]  Rajarshi Das,et al.  Gaussian LDA for Topic Models with Word Embeddings , 2015, ACL.

[40]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[41]  M. K. D. T. Maldeniya,et al.  SeMap - mapping dependency relationships into semantic frame relationships , 2013 .

[42]  Yuval Shahar,et al.  Representation of change in controlled medical terminologies , 1999, Artif. Intell. Medicine.