Extracting Key Semantic Terms from Chinese Speech Query for Web Searches

This paper discusses the challenges and proposes a solution to performing information retrieval on the Web using Chinese natural language speech query. The main contribution of this research is in devising a divide-and-conquer strategy to alleviate the speech recognition errors. It uses the query model to facilitate the extraction of main core semantic string (CSS) from the Chinese natural language speech query. It then breaks the CSS into basic components corresponding to phrases, and uses a multi-tier strategy to map the basic components to known phrases in order to further eliminate the errors. The resulting system has been found to be effective.

[1]  Yves Schabes,et al.  FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997 .

[2]  Hsin-Min Wang,et al.  Multi-scale-audio indexing for translingual spoken document retrieval , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[4]  Douglas E. Appelt,et al.  FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997, ArXiv.

[5]  Biing-Hwang Juang,et al.  A Survey on Automatic Speech Recognition with an Illustrative Example on Continuous Speech Recognition of Mandarin , 1996, ROCLING/IJCLCLP.

[6]  Lin-Shan Lee,et al.  Improved spoken document retrieval by exploring extra acoustic and linguistic cues , 2001, INTERSPEECH.

[7]  Helen Meng,et al.  Spoken document retrieval for the languages of Hong Kong , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).

[8]  Julian Kupiec,et al.  MURAX: a robust linguistic approach for question answering using an on-line encyclopedia , 1993, SIGIR.

[9]  Kenney Ng Information fusion for spoken document retrieval , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  E. Dura Natural Language in Information Retrieval , 2003, CICLing.

[11]  Yong Tang,et al.  Character Error Correction for Chinese Speech Recognition System , 2005 .

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Lisa F. Rau,et al.  Innovations in Text Interpretation , 1993, Artif. Intell..

[14]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .