Speechfind for CDP: Advances in spoken document retrieval for the U. S. collaborative digitization program

This paper presents our recent advances for SpeechFind, a CRSS-UTD designed spoken document retrieval system for the U.S. based Collaborative Digitization Program (CDP). A proto-type of SpeechFind for the CDP is currently serving as the search engine for 1,300 hours of CDP audio content which contain a wide range of acoustic conditions, vocabulary and period selection, and topics. In an effort to determine the amount of user corrected transcripts needed to impact automatic speech recognition (ASR) and audio search, a web-based online interface for verification of ASR-generated transcripts was developed. The procedure for enhancing the transcription performance for SpeechFind is also presented. A selection of adaptation methods for language and acoustic models are employed depending on the acoustics of the corpora under test. Experimental results on the CDP corpus demonstrate that the employed model adaptation scheme using the verified transcripts is effective in improving recognition accuracy. Through a combination of feature/acoustic model enhancement and language model selection, up to 24.8% relative improvement in ASR was obtained. The SpeechFind system, employing automatic transcript generation, online CDP transcript correction, and our transcript reliability estimator, demonstrates a comprehensive support mechanism to ensure reliable transcription and search for U.S. libraries with limited speech technology experience.

[1]  John H. L. Hansen,et al.  Advances in speechfind: transcript reliability estimation employing confidence measure based on discriminative sub-word model for SDR , 2007, INTERSPEECH.

[2]  Jing Huang,et al.  Towards automatic transcription of large spoken archives - English ASR for the MALACH project , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Puming Zhan,et al.  Progress in Broadcast News transcription at Dragon Systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  John H. L. Hansen,et al.  Blind Feature Compensation for Time-Variant Band-Limited Speech Recognition , 2007, IEEE Signal Processing Letters.

[5]  John H. L. Hansen,et al.  Missing-feature reconstruction for band-limited speech recognition in spoken document retrieval , 2006, INTERSPEECH.

[6]  John H. L. Hansen,et al.  Efficient audio stream segmentation via the combined T/sup 2/ statistic and Bayesian information criterion , 2005, IEEE Transactions on Speech and Audio Processing.

[7]  John H. L. Hansen,et al.  Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .

[9]  Karen Spärck Jones,et al.  The Cambridge University spoken document retrieval system , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Mark J. F. Gales,et al.  Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).