Automatic Speech Recognition and its Application to Information Extraction

This paper describes recent progress and the author's perspectives of speech recognition technology. Applications of speech recognition technology can be classified into two main areas, dictation and human-computer dialogue systems. In the dictation domain, the automatic broadcast news transcription is now actively investigated, especially under the DARPA project. The broadcast news dictation technology has recently been integrated with information extraction and retrieval technology and many application systems, such as automatic voice document indexing and retrieval systems, are under development. In the human-computer interaction domain, a variety of experimental systems for information retrieval through spoken dialogue are being investigated. In spite of the remarkable recent progress, we are still behind our ultimate goal of understanding free conversational speech uttered by any speaker under any environment. This paper also describes the most important research issues that we should attack in order to advance to our ultimate goal of fluent speech recognition. pattern recognition paradigm, a data-driven approach which makes use of a rich set of speech utterances from a large population of speakers, the use of stochastic acoustic and language modeling, and the use of dynamic programmingbased search methods. A series of (D)ARPA projects have been a major driving force of the recent progress in research on la rge-vocabulary , con t inuous-speech recognition. Specifically, dictation of speech reading newspapers, such as north America business newspapers including the Wall Street Journal (WSJ), and conversational speech recognition using an Air Travel Information System (ATIS) task were actively investigated. More recent DARPA programs are the broadcast news dictation and natural conversational speech recognition using Switchboard and Call Home tasks. Research on human-computer dialogue systems, the Communicator program, has also started [ 1 ]. Various other systems have been actively investigated in US, Europe and Japan stimulated by DARPA projects. Most of them can be classified into either dictation systems or human-computer dialogue systems.

[1]  Sadaoki Furui,et al.  Designing a multimodal dialogue system for information retrieval , 1998, ICSLP.

[2]  Alexander H. Waibel,et al.  The interactive systems labs view4you video indexing system , 1998, ICSLP.

[3]  Z. Harris Co-Occurrence and Transformation in Linguistic Structure , 1957 .

[4]  Sadaoki Furui,et al.  Future directions in speech information processing , 1998 .

[5]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[6]  Sadaoki Furui,et al.  Improvements in Japanese Broadcast News Transcription , 1999 .

[7]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[8]  Jean-Luc Gauvain,et al.  Speech recognition for an information kiosk , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Julia Hirschberg,et al.  SCAN - speech content based audio navigator: a system overview , 1998, ICSLP.

[10]  Jean-Luc Gauvain,et al.  The LIMSI RailTel System: Field trial of a telephone service for rail travel information , 1997, Speech Commun..

[11]  Alexander G. Hauptmann,et al.  SPEECH RECOGNITION AND INFORMATION RETRIEVAL: EXPERIMENTS IN RETRIEVING SPOKEN DOCUMENTS , 1997 .

[12]  Shigeki Sagayama,et al.  Speaker adaptation based on transfer vector field smoothing with continuous mixture density HMMs , 1992, ICSLP.

[13]  Jean-Luc Gauvain,et al.  The LIMSI ARISE system for train travel information , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..