论文信息 - A Hybrid Model for Urdu Hindi Transliteration

A Hybrid Model for Urdu Hindi Transliteration

We report in this paper a novel hybrid approach for Urdu to Hindi transliteration that combines finite-state machine (FSM) based techniques with statistical word language model based approach. The output from the FSM is filtered with the word language model to produce the correct Hindi output. The main problem handled is the case of omission of diacritical marks from the input Urdu text. Our system produces the correct Hindi output even when the crucial information in the form of diacritic marks is absent. The approach improves the accuracy of the transducer-only approach from 50.7% to 79.1%. The results reported show that performance can be improved using a word language model to disambiguate the output produced by the transducer-only approach, especially when diacritic marks are not present in the Urdu input.

Pushpak Bhattacharyya | Christian Boitet | Laurent Besacier | Muhammad Ghulam Abbas Malik

[1] Gökhan Tür,et al. Automatic detection of sentence boundaries and disfluencies based on recognized words , 1998, ICSLP.

[2] Pushpak Bhattacharyya,et al. Hindi Urdu Machine Transliteration using Finite-State Transducers , 2008, COLING.

[3] S. H. Kellogg,et al. Grammar of the Hindi Language , 1989 .

[4] Kevin Knight,et al. Machine Transliteration , 1997, CL.

[5] Sarmad Hussain,et al. Letter-to-Sound Conversion for Urdu Text-to-Speech System , 2004, COLING 2004.

[6] M. Scully,et al. Abstract , 2003 .

[7] Kalervo Järvelin,et al. Fuzzy translation of cross-lingual spelling variants , 2003, SIGIR.

[8] Tafseer Ahmed,et al. Hindi to Urdu Conversion: Beyond Simple Transliteration , 2009 .

[9] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10] J. V. Rauff,et al. Finite State Morphology , 2007 .

[11] Kevin Knight,et al. Translating Names and Technical Terms in Arabic Text , 1998, SEMITIC@COLING.

[12] Sanjeev Khudanpur,et al. Transliteration of proper names in cross-language applications , 2003, SIGIR.

[13] Gregory Grefenstette,et al. Automatic transliteration for Japanese-to-English text retrieval , 2003, SIGIR.