Embedding Linguistic Features in Word Embedding for Preposition Sense Disambiguation in English - Malayalam Machine Translation Context

Preposition sense disambiguation has huge significance in Natural language processing tasks such as Machine Translation. Transferring the various senses of a simple preposition in source language to a set of senses in target language has high complexity due to these many-to-many relationships, particularly in English-Malayalam machine translation. In order to reduce this complexity in the transfer of senses, in this paper, we used linguistic information such as noun class features and verb class features of the respective noun and verb correlated to the target simple preposition. The effect of these linguistic features for the proper classification of the senses (postposition in Malayalam) is studied with the help of several machine learning algorithms. The study showed that, the classification accuracy is higher when both verb and noun class features are taken into consideration. In linguistics, the major factor that decides the sense of the preposition is the noun in the prepositional phrase. The same trend was observed in the study when the training data contained only noun class features. i.e., noun class features dominates the verb class features.

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[3]  John R. Taylor Prepositions: patterns of polysemization and strategies of disambiguation , 1993 .

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[6]  K P Soman,et al.  Aerial image classification using regularized least squares classifier , 2016 .

[7]  Timothy Baldwin,et al.  MELB-YB: Preposition Sense Disambiguation Using Rich Semantic Features , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[8]  B. Premjith,et al.  Sense Disambiguation of English Simple Prepositions in the Context of English–Hindi Machine Translation System , 2018 .

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  K. P. Soman,et al.  Machine Learning with SVM and other Kernel methods , 2009 .

[12]  Violaine Prince An Empirical Study for a Machine Aided Translation of French Prepositions 'à', 'de' and 'en' into English , 2017 .

[13]  Dirk Hovy,et al.  Disambiguation of Preposition Sense Using Linguistically Motivated Features , 2009, NAACL.

[14]  Dirk Hovy,et al.  What’s in a Preposition? Dimensions of Sense Disambiguation for an Interesting Word Class , 2010, COLING.

[15]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[16]  Shyam Diwakar,et al.  DATA MINING: THEORY AND PRACTICE , 2006 .

[17]  Meng Zhang,et al.  Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[18]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[19]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[20]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[21]  B. Premjith,et al.  A deep learning approach for Malayalam morphological analysis at character level , 2018 .

[22]  P.a Poornachandran,et al.  A distributed approach for predicting malicious activities in a network from a streaming data with support vector machine and explicit random feature mapping , 2016 .