SCANMail: browsing and searching speech data by content

Increasing amounts of public, corporate, and private audio data are available for use, but limited in usefulness by the lack of tools to permit their browsing and search. In this paper, we describe SCANMail, a system that employs automatic speech recognition, information retrieval, information extraction, and human computer interaction technology to permit users to browse and search their voicemail messages by content through a graphical user interface interface. The SCANMail client also provides note-taking capabilities as well as browsing and querying features. A CallerId server also proposes caller names from existing caller acoustic models and is trained from user feedback. An Email server sends the original message plus its transcription to a mailing address specified in the user’s profile.

[1]  Jordan Cohen,et al.  Vocal tract normalization in speech recognition: Compensating for systematic speaker variability , 1995 .

[2]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[3]  Aaron E. Rosenberg,et al.  Foldering voicemail messages by caller using text independent speaker recognition , 2000, INTERSPEECH.

[4]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[5]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[6]  Julia Hirschberg,et al.  Play it again: a study of the factors underlying speech browsing behavior , 1998, CHI Conference Summary.

[7]  Michiel Bacchiani Automatic transcription of voicemail at AT&T , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Julia Hirschberg,et al.  All talk and all action: strategies for managing voicemail messages , 1998, CHI Conference Summary.

[9]  Michiel Bacchiani,et al.  Using maximum likelihood linear regression for segment clustering and speaker identification , 2000, INTERSPEECH.

[10]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[11]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[12]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[13]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..