The ISL View4You Broadcast News Transcription System

In this paper, we introduce the Interactive Systems Laboratories multimedia data indexing and retrieval system 'View4You'. The main components of the system, namely the segmenter, the speech recognizer and the information retrieval engine, are described in detail.In the View4You system, public television newscasts are recorded on a daily basis. The newscasts are automatically segmented and an index is created for each of the segments by means of automatic speech recognition. The user can query the system in natural language. The system returns a list of segments which is sorted by relevance with respect to the user query. By selecting a segment, the user can watch the corresponding part of the news show on his or her computer screen.Several end to end evaluations on real world data, using questions from naive users, are described. By substituting each of the components of the system with a perfect (manually simulated) one, the effect of the components' imperfection on the end to end result can be determined. We show that the information retrieval component has the largest impact on the system performance, followed by the segmentation. The quality of the speech recognizer, as long as its error rate is below approximately 25%, is shown to have only a relatively small importance.

[1]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[2]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[3]  Richard M. Schwartz,et al.  Broadcast news transcription , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Thomas Niesler,et al.  Experiments in broadcast news transcription , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[6]  Thomas Schaaf,et al.  Estimating confidence using word lattices , 1997, EUROSPEECH.

[7]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[8]  Ron Sacks-Davis,et al.  Similarity Measures for Short Queries , 1995, TREC.

[9]  Puming Zhan,et al.  Dragon systems' 1998 broadcast news transcription system , 1999, EUROSPEECH.

[10]  Lori Lamel,et al.  The LIMSI 1998 Hub-4E Transcription System , 1997 .

[11]  Alexander H. Waibel,et al.  Strategies for automatic segmentation of audio data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Puming Zhan,et al.  Speaker normalization based on frequency warping , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Stephen E. Robertson,et al.  Okapi at TREC-5 , 1996, TREC.

[14]  Howard D. Wactlar,et al.  INFORMEDIATM: NEWS-ON-DEMAND EXPERIMENTS IN SPEECH RECOGNITION , 1998 .

[15]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[16]  Peder A. Olsen,et al.  Transcription of broadcast news-some recent improvements to IBM's LVCSR system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[18]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[20]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[21]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.