A Maximum Entropy Approach to Chinese Pin Yin-To-Character Conversion

This paper introduces a new approach based upon maximum entropy (ME) frame to solve the Pinyin-to-character (PTC) conversation problem. Mostly there is more than one Chinese characters share the same Pinyin. The task of PTC algorithm is to distinguish such kind ambiguity. PTC can be regards as to classify a Pinyin to a special character according the context which is represented as feature in ME. By taking the advantage of ME, the local and non-local information are included, so the conversation performance is improved. Experiments show that 87% hit rate (without tone) is achieved.

[1]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[2]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[3]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[5]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[6]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[7]  Yue-Shi Lee Task adaptation in stochastic language model for Chinese homophone disambiguation , 2003, TALIP.

[8]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[9]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[10]  Xiaolong Wang,et al.  Mining Pinyin-to-character conversion rules from large-scale corpus: a rough set approach , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[12]  E. Jaynes,et al.  NOTES ON PRESENT STATUS AND FUTURE PROSPECTS , 1991 .

[13]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[14]  Thomas Niesler,et al.  Variable-length categoryn-gram language models , 1999, Comput. Speech Lang..

[15]  Diao Lu-hong Chinese Intelligent Input Method based on Hybrid Language Model , 2007 .

[16]  Robert L. Mercer,et al.  Adaptive Language Modeling Using Minimum Discriminant Estimation , 1992, HLT.

[17]  Zheng Chen,et al.  A New Statistical Approach To Chinese Pinyin Input , 2000, ACL.

[18]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.