A bi-lingual Mandarin/taiwanese (min-nan), large vocabulary, continuous speech recognition system based on the tong-yong phonetic alphabet (TYPA)

In this paper, we describe the first Mandarin/Taiwanese (Min-nan) bi-lingual, continuous speech recognition system for large vocabulary or vocabulary-independent applications. A phonetic transcription system called Tong-yong Phonetic Alphabet (TYPA) is described and used to transcribe the bilingual Mandarin/Taiwanese lexicons. The Right-ContextDependent (RCD) phonetic continuous-density Hidden Markov Models (CHMM) based on TYPA are used as the acoustic models. A lexicon tree containing 40 thousand bilingual words is used as a searching net to evaluate the performance of the recognizer. A 92.55% word accuracy is achieved on a speaker dependent case. Furthermore, we construct a continuous-speech real-time demonstration system based on the vocabulary-independent RCD models for a specific application domain of automated hospital appointment arrangement, where Mandarin/Taiwanese mixed speech is very possible to happen.