Lexical Resources for Hindi Marathi MT

In this paper we describe ways of utilizing lexical resources to improve the quality of statistical machine translation. We have augmented the training corpus with various lexical resources such as IndoWordnet semantic relation set, function words, kridanta pairs and verb phrases. We augmented parallel corpora in two ways (a) additional vocabulary and (b) inflected word forms. We have described case studies, evaluations and have given detailed error analysis for both Marathi to Hindi and Hindi to Marathi machine translation systems. From the evaluations we observed an order of magnitude improvement in translation quality. Lexical resources do help uplift performance when parallel corpora is scanty.

[1]  Rajeev Sangal,et al.  Coupling Statistical Machine Translation with Rule-based Transfer and Generation , 2010, AMTA.

[2]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[3]  References , 1971 .

[4]  Pushpak Bhattacharyya,et al.  Processing of Kridanta (Participle) in Marathi , 2011 .

[5]  Bonnie J. Dorr,et al.  Machine Translation Divergences: A Formal Description and Proposed Solution , 1994, CL.

[6]  Pushpak Bhattacharyya,et al.  Clause-Based Reordering Constraints to Improve Statistical Machine Translation , 2011, IJCNLP.

[7]  V. Dixit,et al.  Design and implementation of a morphology-based spellchecker for Marathi, an Indian language , 2005 .

[8]  Pushpak Bhattacharyya,et al.  Partially modelling word reordering as a sequence labelling problem , 2012, SMT@COLING.

[9]  Antony P. J.,et al.  Machine Translation Approaches and Survey for Indian Languages , 2013, ROCLING/IJCLCLP.

[10]  Latha R. Nair,et al.  Machine Translation Systems for Indian Languages , 2012 .

[11]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[12]  Kevin Knight,et al.  Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[13]  Pushpak Bhattacharyya,et al.  Comparison of SMT and RBMT; The Requirement of Hybridization for Marathi-Hindi MT , 2017, ArXiv.

[14]  Alon Lavie,et al.  Meteor, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output , 2008, WMT@ACL.

[15]  P. J. Antony,et al.  Machine Translation Approaches and Survey for Indian Languages , 2013, ROCLING/IJCLCLP.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Pushpak Bhattacharyya,et al.  Synset Based Multilingual Dictionary: Insights, Applications and Challenges , 2008 .

[18]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[19]  Hermann Ney,et al.  Statistical multi-source translation , 2001, MTSUMMIT.