The Application of Classical Informa - tion Retrieval Techniques to Spoken Documents

The research presented in this thesis addresses the topic of ad hoc retrieval of information from collections of spoken items such as radio news bulletins. Modern digital computers are becoming increasingly adept at processing non{ textual data, such as speech. Consequently, new methods are required to allow users to pin{point speciic items of interest in large data collections. Such a method might exploit the Hidden Markov Model (HMM), which has proved successful as the basis for many experimental speech recognition systems, and the well{understood techniques of document retrieval that have arisen from many years' research into textual information retrieval (IR). However, so far there has been little exploration of the potential combination of these methods in order to index \spoken word" data. In the IR community, several papers have put forward an approach to the problem but this approach has not been properly tested. Work done in the speech recognition area has tended to concentrate on developing systems for topic classiication. These systems are extensively pre{ trained for the task of partitioning a set of spoken messages into a set of disjoint and exhaustive classes, each one representing some topic. Their utility is, in practice, limited by the xed class set and slow operation, and they do not represent an approach to the problem of retrieving items that correspond to arbitrary topics. This thesis describes experiments combining the techniques of classical information retrieval with HMM{based speech recognition methods in order to retrieve items from a collection of spoken messages corresponding to items of radio news. In a base-line system, a new technique for wordspotting allows items matching an arbitrary expression of the information requirement to be retrieved quickly and reasonably accurately. The system is subsequently improved through the addition of appropriate language models and the use of state{of{the{art acoustic modelling. Finally, performance is compared with that obtained by two alternative approaches, including one recently proposed in the IR literature, and found to be considerably superior. 1 Declaration This thesis describes research carried out in the Speech, Vision and Robotics Group of the University of Cambridge Engineering Department between October 1991 and February 1995. It is the result of my own work and includes nothing which is the outcome of work done in collaboration. The length of this thesis, excluding references and gure captions, is forty{four thousand words. Acknowledgements Firstly, I wish to express my gratitude to Prof. Steve Young, my …

[1]  Wendy J. Holmes,et al.  Speech Synthesis and Recognition , 1988 .

[2]  L. G. Miller,et al.  Improvements and applications for key word recognition using hidden Markov modeling techniques , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[4]  Herbert Gish,et al.  Secondary processing using speech segments for an HMM word spotting system , 1992, ICSLP.

[5]  Herbert Gish,et al.  Phonetic-based word spotter: various configurations and application to event spotting , 1993, EUROSPEECH.

[6]  Karen Spärck Jones,et al.  Video mail retrieval: the effect of word spotting accuracy on precision , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Jean-Luc Gauvain,et al.  High performance speaker-independent phone recognition using CDHMM , 1993, EUROSPEECH.

[8]  Richard C. Rose Definition of subword acoustic units for wordspotting , 1993, EUROSPEECH.

[9]  Tomek Strzalkowski,et al.  Document Representation in Natural Language Text Retrieval , 1994, HLT.

[10]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[11]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[12]  Cyril Cleverdon,et al.  Optimizing convenient online access to bibliographic databases , 1984 .

[13]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[14]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[15]  Clement T. Yu,et al.  Automatic indexing using term discrimination and term precision measurements , 1976, Information Processing & Management.

[16]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[17]  Stephen Robertson,et al.  The methodology of information retrieval experiment , 1981 .

[18]  Richard C. Rose,et al.  Techniques for robust word spotting in continuous speech messages , 1991, EUROSPEECH.

[19]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[20]  Peter Schäuble,et al.  The Perils of Interpreting Recall and Precision Values , 1991, Information Retrieval.

[21]  Richard P. Lippmann,et al.  Techniques for information retrieval from voice messages , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[23]  Peter Schäuble,et al.  A system for retrieving speech documents , 1992, SIGIR '92.

[24]  K. Sparck Jones,et al.  Simple, proven approaches to text retrieval , 1994 .

[25]  Clement T. Yu,et al.  Effective information retrieval using term accuracy , 1977, CACM.

[26]  Steve J. Young,et al.  The use of state tying in continuous speech recognition , 1993, EUROSPEECH.

[27]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[28]  S. Wray,et al.  The Medusa applications environment , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[29]  Lynn Wilcox,et al.  HMM-based wordspotting for voice editing and indexing , 1991, EUROSPEECH.

[30]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[31]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[32]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[33]  Jonathan G. Fiscus,et al.  Benchmark Tests for the DARPA Spoken Language Program , 1993, HLT.

[34]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[35]  Steve J. Young,et al.  A fast lattice-based approach to vocabulary independent wordspotting , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Janet M. Baker Large vocabulary speaker-adaptive continuous speech recognition research overview at dragon systems , 1991, EUROSPEECH.

[37]  Vijay Balasubramanian,et al.  Speech-Based Retrieval Using Semantic Co-Occurrence Filtering , 1994, HLT.

[38]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[39]  Steve Renals,et al.  IPA: improved phone modelling with recurrent neural networks , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Beerud Dilip Sheth,et al.  A learning approach to personalized information filtering , 1994 .

[41]  P.C. Woodland,et al.  The 1994 HTK large vocabulary speech recognition system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[42]  Karen Spärck Jones Search Term Relevance Weighting given Little Relevance Information , 1997, J. Documentation.

[43]  W. L. Miller,et al.  The evaluation of large information retrieval systems with application to MEDLARS , 1970 .

[44]  Thomas Hornstein Telephone Voice Interfaces on the Cheap , 1997 .

[45]  Richard Rose,et al.  Discriminant wordspotting techniques for rejecting non-vocabulary utterances in unconstrained speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  Richard Rose,et al.  A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[47]  Jean M. Tague,et al.  The pragmatics of information retrieval experimentation , 1981 .

[48]  Michael Maurice Hochberg A comparison of state-duration modeling techniques for connected speech recognition , 1993 .

[49]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[50]  Philip J. Hayes,et al.  CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[51]  Re. Techniques for Information Retrieval from Speech Messages , 1991 .

[52]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[53]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[54]  M. A. Bush,et al.  Training and search algorithms for an interactive wordspotting system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55]  Frank Ryan Searching The Times, The Guardian and The Independent on CD-ROM , 1991 .

[56]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[57]  Donna Harman,et al.  The Second Text Retrieval Conference (TREC-2) , 1995, Inf. Process. Manag..

[58]  Peter Schäuble,et al.  Assessing the Retrieval Effectiveness of a Speech Retrieval System by Simulating Recognition Errors , 1994, HLT.

[59]  Karen Sparck Jones Reeections on Trec , 1995 .

[60]  Michael Picheny,et al.  Context Dependent Modeling of Phones in Continuous Speech Using Decision Trees , 1991, HLT.

[61]  José B. Mariño,et al.  Syllabic fillers for Spanish HMM keyword spotting , 1992, ICSLP.