The State of the Art in Human-computer Speech-based Interface Technologies

This paper gives an overview of the state-of-the-art human-computer interface technologies based on automotlc speech recognition and text-to-speech synthesis. It also describes the recent R&D activities ot the Chinese University of Hong Kong (CUHK) in this challenging field. Speech is the most convenient and natural means of communication among human beings. By enabling the computer to listen and speak, speech technologies have empowered many important applications that improve our quality of life. Hong Kong is a trilingual society where people speak Cantonese, Putonghua as well as English. While a great deal of efforts have been spent on speech and language processing for English and Putonghua in Western countries and China, the Speech Research Group at CUHK is well known to be one of theJew pioneers who Initiated extensive study on Cantonesejacused speech technologies. Cantonese is a major Chinese direct spoken by over 70 million people in South China and Hong Kong. We shall introduce the various speeCh...

[1]  Frank K. Soong,et al.  Modeling Cantonese pronunciation variation by acoustic model refinement , 2003, INTERSPEECH.

[2]  Tan Lee,et al.  Using tone information in Cantonese continuous speech recognition , 2002, TALIP.

[3]  Tan Lee,et al.  Searching for the Missing Piece , 2002 .

[4]  Pak-Chung Ching,et al.  Phone-based speech synthesis with neural network and articulatory control , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Pak-Chung Ching,et al.  Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..

[6]  Helen M. Meng,et al.  Semi-automatic grammar induction for bi-directional English-Chinese machine translation , 2001, INTERSPEECH.

[7]  Pak-Chung Ching,et al.  Multi-scale audio indexing for Chinese spoken document retrieval , 2000, INTERSPEECH.

[8]  Hsin-Min Wang,et al.  Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval , 2003, INTERSPEECH.

[9]  Pak-Chung Ching,et al.  CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects , 2002, INTERSPEECH.

[10]  Tan Lee,et al.  Design, Compilation and Processing of CUCall: A Set of Cantonese Spoken Language Corpora Collected Over Telephone Networks , 2001, ROCLING/IJCLCLP.

[11]  Tan Lee,et al.  Cantonese text-to-speech synthesis using sub-syllable units , 2001, INTERSPEECH.

[12]  Tan Lee,et al.  Cantonese syllable recognition using neural networks , 1999, IEEE Trans. Speech Audio Process..

[13]  Pak-Chung Ching,et al.  Query expansion using phonetic confusions for Chinese spoken document retrieval , 2000, IRAL '00.

[14]  Helen Meng,et al.  Document Expansion using a Side Collection for Monolingual and Cross-language Spoken Document Retrieval , 2003 .

[15]  Wai Lam,et al.  To believe is to understand , 1999, EUROSPEECH.

[16]  Tan Lee,et al.  Acoustic modeling and language modeling for cantonese LVCSR , 1999, EUROSPEECH.

[17]  Pak-Chung Ching,et al.  Multi-Scale Spoken Document Retrieval for Cantonese Broadcast News , 2004, Int. J. Speech Technol..

[18]  Ke Chen,et al.  ISIS: A multilingual spoken dialog system developed with CORBA and KQML agents , 2000, INTERSPEECH.

[19]  Pak-Chung Ching,et al.  A hybrid approach to synthesize high quality Cantonese speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[20]  Tan Lee,et al.  Using cross-syllable units for Cantonese speech synthesis , 2000, INTERSPEECH.

[21]  Xinbo Gao,et al.  Speech retrieval with video parsing for television news programs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[22]  P. C. Ching,et al.  From phonology and acoustic properties to automatic recognition of Cantonese , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[23]  Helen M. Meng,et al.  Automatic Grammar Partitioning for Syntactic Parsing , 2001, IWPT.

[24]  Wai Lam,et al.  Learning Belief Networks for Language Understanding , 1999 .

[25]  Helen M. Meng,et al.  CU FOREX: a bilingual spoken dialog system for foreign exchange enquiries , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[26]  Helen Meng,et al.  Interdependencies among dialog acts, task goals and discourse inheritance in mixed-initiative dialogs , 2002 .

[27]  Helen Meng,et al.  Improvements on a semi-automatic grammar induction framework , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[28]  Tan Lee,et al.  Lexical tree decoding with a class-based language model for Chinese speech recognition , 2000, Interspeech.

[29]  Lai-Wan Chan,et al.  Automatic recognition of continuous Cantonese speech with very large vocabulary , 1997, EUROSPEECH.

[30]  Helen M. Meng,et al.  Concatenating syllables for response generation in spoken language applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[31]  Pak-Chung Ching,et al.  Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion , 2003, TALIP.

[32]  Helen M. Meng,et al.  Example-based bi-directional Chinese-English machine translation with semi-automatically induced grammars , 2003, INTERSPEECH.

[33]  Helen M. Meng,et al.  Natural language response generation in mixed-initiative dialogs using task goals and dialog acts , 2003, INTERSPEECH.

[34]  Chen Yang,et al.  Unsupervised n-best based model adaptation using model-level confidence measures , 2002, INTERSPEECH.

[35]  Tan Lee,et al.  Spoken language resources for Cantonese speech processing , 2002, Speech Commun..