Thematic indexing of spoken documents by using self-organizing maps

A method is presented to provide a useful searchable index for spoken audio documents. The task differs from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take advantage of the large size of the database and select the best index terms for each document with the help of the other documents close to it using a semantic vector space. First, the audio stream is converted into a text stream by a speech recognizer. Then the text of each story is represented in a vector space as a document vector which is the normalized sum of the word vectors in the story. A large collection of such document vectors is used to train a self-organizing map (SOM) to find latent semantic structures in the collection. As the stories in spoken news are short and will include speech recognition errors, smoothing of the document vectors using the semantic clusters determined by the SOM is introduced to enhance the indexing. The application in this paper is the indexing and retrieval of broadcast news on radio and television. Test results are given using the evaluation data from the text retrieval conference (TREC) spoken document retrieval (SDR) task.

[1]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[2]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[3]  Jerome R. Bellegarda,et al.  Speech recognition experiments using multi-span statistical language models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Alfred Ultsch,et al.  Data Mining and Knowledge Discovery with Emergent Self-Organizing Feature Maps for Multivariate Time Series , 1999 .

[5]  Jerome R. Bellegarda,et al.  A statistical language modeling approach integrating local and global constraints , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[6]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[7]  Mikko Kurimo Indexing Audio Documents by using Latent Semantic Analysis and SOM , 1999 .

[8]  Thomas Hofmann,et al.  Probabilistic Topic Maps: Navigating through Large Text Collections , 1999, IDA.

[9]  Stanley F. Chen,et al.  Evaluation Metrics For Language Models , 1998 .

[10]  Tony Robinson,et al.  Time-first search for large vocabulary speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  Mikko Kurimo,et al.  Latent Semantic Indexing by Self-Organizing Map , 1999 .

[12]  Andreas Rauber,et al.  Automatic Labeling of Self-Organizing Maps: Making a Treasure-Map Reveal Its Secrets , 1999, PAKDD.

[13]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[14]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[15]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[16]  Karen Spärck Jones,et al.  The Cambridge University spoken document retrieval system , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[18]  Olli Simula,et al.  Self-Organizing map in analysis of large-scale industrial systems , 1999 .

[19]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[20]  Steve Renals,et al.  The THISL SDR System At TREC-8 , 1999, TREC.

[21]  Michael J. Witbrock,et al.  Improving the suitability of imperfect transcriptions for information retrieval from spoken documents , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[22]  Steve Renals,et al.  The THISL broadcast news retrieval system. , 1999 .

[23]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[24]  Samuel Kaski,et al.  Self organization of a massive text document collection , 1999 .

[25]  Dragutin Petkovic,et al.  Spoken Document Retrieval , 2000 .

[26]  Steve Renals,et al.  Recognition, indexing and retrieval of british broadcast news with the THISL system , 1999, EUROSPEECH.

[27]  Victor Zue,et al.  Phonetic recognition for spoken document retrieval , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[28]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[29]  Michael W. Berry,et al.  Large-Scale Sparse Singular Value Computations , 1992 .

[30]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[31]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[32]  Erkki Oja,et al.  Kohonen Maps , 1999, Encyclopedia of Machine Learning.

[33]  Steve Renals,et al.  The THISL Spoken Document Retrieval System , 1998, TREC.

[34]  James Allan,et al.  INQUERY Does Battle With TREC-6 , 1997, TREC.

[35]  Ellen M. Voorhees,et al.  Spoken Document Retrieval: 1998 Evaluation and Investigation of New Metrics , 1999 .

[36]  Daniel Jurafsky,et al.  Towards better integration of semantic predictors in statistical language modeling , 1998, ICSLP.

[37]  Thomas F. Coleman,et al.  Handbook for matrix computations , 1988 .

[38]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[39]  Johan M. Andersen Baseline System for Hybrid Speech Recognition on French (Experiments on BREF) , 1998 .

[40]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[41]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[42]  Timo Honkela,et al.  Newsgroup Exploration with WEBSOM Method and Browsing Interface , 1996 .