Identification and classification of relations for Indian languages using machine learning approaches for developing a domain specific ontology

Information extraction and classification using Natural Language Processing techniques of layered architecture such as pre-processing task, processing of semantic analysis etc., helps in implementing further deeper evaluation techniques for the accuracy of natural language based electronic database. This paper explores relational information extraction of multilingual IndoWordNet database matching with domain specific terms. Further, extracted information are processed through conventional statistical methods, Normalized Web Distance (NWD) similarity method and two other machine learning evaluation techniques such as Support Vector Machine (SVM), Neural Network (NN) to compare with their accuracy. Results of machine leaning based techniques outperform with significant improved accuracy over conventional methods. The objective of using these techniques along with semantic web technology is to initiate a proof of concept for ontology generation by identification and classification of extracted relational information from IndoWordNet. This paper also highlights domain specific challenges and issues in developing relational model of ontology.

[1]  Ming Li,et al.  Normalized Information Distance , 2008, ArXiv.

[2]  Dong-Hong Ji,et al.  Unsupervised Feature Selection for Relation Extraction , 2005, IJCNLP.

[3]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[4]  Mohammed Bennamoun,et al.  Ontology learning from text: A look back and into the future , 2012, CSUR.

[5]  Takashi Chikayama,et al.  Simple Customization of Recursive Neural Networks for Semantic Relation Classification , 2013, EMNLP.

[6]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[7]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[8]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Paul Buitelaar,et al.  Ontology Learning from Text: An Overview , 2005 .

[11]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[12]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[13]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[14]  Pushpak Bhattacharyya,et al.  Semi-supervised Relation Extraction using EM Algorithm , 2013 .

[15]  Lina Zhou,et al.  Ontology learning: state of the art and open issues , 2007, Inf. Technol. Manag..

[16]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[17]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[18]  Chu-Ren Huang,et al.  22nd International Conference on Computational Linguistics , 2008 .

[19]  Hiroshi Nakagawa,et al.  Reducing Wrong Labels in Distant Supervision for Relation Extraction , 2012, ACL.

[20]  Megha Garg,et al.  Identification of relations from IndoWordNet for Indian languages using Support Vector Machine , 2015, 2015 International Conference on Computing and Network Communications (CoCoNet).