Transliterated Word Identification and Application to Query Translation Mining

Query translation mining is a key technique in cross-language information retrieval and machine translation knowl-edge acquisition. For better performance, the queries are classified into transliterated words and non-transliterated words based on transliterated word identification model, and are further channeled to different mining processes. This paper is a pilot study on query classification for better translation mining performance, which is based on supervised classification and linguistic heuristics. The person name identification gets a precision of over 97%. Transliterated word translation mining shows satisfactory performance.

[1]  Wen-Hsiang Lu,et al.  Improving Translation of Queries with Infrequent Unknown Abbreviations and Proper Names , 2008, Int. J. Comput. Linguistics Chin. Lang. Process..

[2]  Jason S. Chang,et al.  Word-Transliteration Alignment , 2003, ROCLING/IJCLCLP.

[3]  Kenji Suzuki,et al.  Using the Web as a Bilingual Dictionary , 2001, DDMMT@ACL.

[4]  Wai Lam,et al.  Learning phonetic similarity for matching named entity translations and mining new translations , 2004, SIGIR '04.

[5]  Hsi-Jian Lee,et al.  Translation of web queries using anchor text mining , 2002, TALIP.

[6]  Sanjeev Khudanpur,et al.  Transliteration of Proper Names in Cross-Lingual Information Retrieval , 2003, NER@ACL.

[7]  Hwee Tou Ng,et al.  Mining New Word Translations from Comparable Corpora , 2004, COLING.

[8]  Lidia Khmylko Supervised by : , 1991 .

[9]  Yue Xu,et al.  Web-Based Query Translation for English-Chinese CLIR , 2008, Int. J. Comput. Linguistics Chin. Lang. Process..

[10]  Wei Gao,et al.  Phoneme-Based Transliteration of Foreign Names for OOV Problem , 2004, IJCNLP.

[11]  Pu-Jen Cheng,et al.  Translating unknown queries with web corpora for cross-language information retrieval , 2004, SIGIR '04.

[12]  Karin M. Verspoor,et al.  Automatic English-Chinese name transliteration for development of multilingual resources , 1998, ACL.

[13]  Hsin-Hsi Chen,et al.  Backward Machine Transliteration by Learning Phonetic Similarity , 2002, CoNLL.

[14]  K. Saravanan,et al.  Mining named entity transliteration equivalents from comparable corpora , 2008, CIKM '08.

[15]  Ying Zhang,et al.  Mining Key Phrase Translations from Web Corpora , 2005, HLT.