Can Distributed Word Embeddings be an alternative to costly linguistic features: A Study on Parsing Hindi

Word Embeddings have shown to be useful in wide range of NLP tasks. We explore the methods of using the embeddings in Dependency Parsing of Hindi, a MoR-FWO (morphologically rich, relatively freer word order) language and show that they not only help improve the quality of parsing, but can even act as a cheap alternative to the traditional features which are costly to acquire. We demonstrate that if we use distributed representation of lexical items instead of features produced by costly tools such as Morphological Analyzer, we get competitive results. This implies that only mono-lingual corpus will suffice to produce good accuracy in case of resource poor languages for which these tools are unavailable. We also explored the importance of these representations for domain adaptation.

[1]  Reut Tsarfaty,et al.  Parsing Morphologically Rich Languages: Introduction to the Special Issue , 2013, Computational Linguistics.

[2]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[3]  Dipti Misra Sharma,et al.  Dependency Annotation Scheme for Indian Languages , 2008, IJCNLP.

[4]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[7]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[8]  Dipti Misra Sharma,et al.  Intra-Chunk Dependency Annotation : Expanding Hindi Inter-Chunk Annotated Treebank , 2012, LAW@ACL.

[9]  Akshar Bharati,et al.  Parsing Free Word Order Languages in the Paninian Framework , 1993, ACL.

[10]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[11]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[12]  Dipti Misra Sharma,et al.  AnnCorra : Annotating Corpora Guidelines For POS And Chunk Annotation For Indian Languages , 2008 .

[13]  Akshar Bharati,et al.  Natural language processing : a Paninian perspective , 1996 .

[14]  Riyaz Ahmad Bhat,et al.  Exploring Semantic Information in Hindi WordNet for Hindi Dependency Parsing , 2013, IJCNLP.

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Stuart M. Shieber,et al.  Evidence against the context-freeness of natural language , 1985 .

[17]  Tom M. Mitchell,et al.  Vector Space Semantic Parsing: A Framework for Compositional Vector Space Models , 2013, CVSM@ACL.