Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information

Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on either a grapheme-based or phoneme-based method. However, transliteration is an orthographical and phonetic converting process. Therefore, both grapheme and phoneme information should be considered in machine transliteration. In this paper, we propose a grapheme and phoneme-based transliteration model and compare it with previous grapheme-based and phoneme-based models using several machine learning techniques. Our method shows about 13∼78% performance improvement.

[1]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[4]  Zhang Le,et al.  Maximum Entropy Modeling Toolkit for Python and C , 2004 .

[5]  Naoto Kato,et al.  Transliteration Considering Context Information based on the Maximum Entropy Method , 2003 .

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[8]  Jian Su,et al.  A Joint Source-Channel Model for Machine Transliteration , 2004, ACL.

[9]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[10]  Key-Sun Choi,et al.  Automatic Transliteration and Back-transliteration by Decision Tree Learning , 2000, LREC.

[11]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[12]  Tetsuya Ishikawa,et al.  Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration , 2001, Comput. Humanit..

[13]  Tsujii Jun'ichi,et al.  Maximum entropy estimation for feature forests , 2002 .

[14]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[15]  강병주,et al.  한국어 정보검색에서 외래어와 영어로 인한 단어불일치문제의 해결 = A resolution of word mismatch problem caused by foreign word transliterations and english words in Korean information retrieval , 2001 .

[16]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[17]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[18]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[19]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[20]  In-Ho Kang,et al.  English-to-Korean Transliteration using Multiple Unbounded Overlapping Phoneme Chunks , 2000, COLING.

[21]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[22]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.