Content-based access to spoken audio

This article describes approaches to content-based access to spoken audio with a qualitative and tutorial emphasis. We describe how the analysis, retrieval, and delivery phases contribute to making spoken audio content more accessible and outline outstanding research issues. We also discuss the main application domains and identify important issues for future developments. The structure of the article is based on the general system architecture for content-based access. Although the tasks within each processing stage may appear unconnected, the interdependencies and the sequence with which they take place vary.

[1]  Ralph Weischedel,et al.  NAMED ENTITY EXTRACTION FROM SPEECH , 1998 .

[2]  N. Morgan,et al.  Pushing the envelope - aside [speech recognition] , 2005, IEEE Signal Processing Magazine.

[3]  Konstantinos Koumpis,et al.  Automatic summarization of voicemail messages using lexical and prosodic features , 2005, TSLP.

[4]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[5]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[6]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[7]  Bhuvana Ramabhadran,et al.  Automatic recognition of spontaneous speech for access to multilingual oral history archives , 2004, IEEE Transactions on Speech and Audio Processing.

[8]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[9]  Yorick Wilks,et al.  Evaluation of an Algorithm for the Recognition and Classification of Proper Names , 1996, COLING.

[10]  Samy Bengio,et al.  Towards Computer Understanding of Human Interactions , 2003, EUSAI.

[11]  Aaron E. Rosenberg,et al.  SCANMail: browsing and searching speech data by content , 2001, INTERSPEECH.

[12]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[13]  Salim Roukos,et al.  Statistical methods for topic segmentation , 2000, INTERSPEECH.

[14]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[15]  Jeffrey C. Reynar Statistical Models for Topic Segmentation , 1999, ACL.

[16]  Daben Liu,et al.  Speech and language technologies for audio indexing and retrieval , 2000, Proceedings of the IEEE.

[17]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[18]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[19]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[20]  Geoffrey Zweig,et al.  Information Extraction from Voicemail , 2001, ACL.

[21]  S. Renals,et al.  Transforming access to the spoken word , 2004 .

[22]  Elizabeth Shriberg To ‘errrr’ is human: ecology and acoustics of speech disfluencies , 2001, Journal of the International Phonetic Association.

[23]  L. Lamel,et al.  Large-vocabulary continuous speech recognition: advances and applications , 2000, Proceedings of the IEEE.

[24]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[25]  Klaus Zechner,et al.  Automatic Summarization of Open-Domain Multiparty Dialogues in Diverse Genres , 2002, CL.

[26]  Mike Flynn,et al.  Browsing Recordings of Multi-party Interactions in Ambient Intelligent Environments , 2004, CHI 2004.

[27]  Fabio Brugnara,et al.  Cross-task portability of a broadcast news speech recognition system , 2002, Speech Commun..

[28]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[29]  Steve Renals,et al.  Indexing and retrieval of broadcast news , 2000, Speech Commun..

[30]  Daniel P. W. Ellis,et al.  Audio information access from meeting rooms , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[31]  Beth Logan,et al.  Speechbot: an experimental speech-based search engine for multimedia content on the web , 2002, IEEE Trans. Multim..

[32]  Sadaoki Furui,et al.  Automatic speech summarization applied to English broadcast news speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Karen Spärck Jones,et al.  TREC-6 1997 Spoken Document Retrieval Track Overview and Results , 1997, TREC.