Automatic transcription of courtroom speech

In this paper we describe our on-going effort in developing a speech recognition system for transcribing courtroom hearings. Court hearings are a rich source of naturally occurring speech data, much of which is in public domain. The presence of multiple microphones coupled with presence of noise and reverberation makes the problem simultaneously rich and challenging. We have exploited the availability of multiple channels to mitigate, to some extent, the noise problem prevalent in courtroom speech. By using a novel technique for channel change detection, domain-specific language modeling, and unsupervised channel adaptation we have been able to achieve a word error rate (WER) of 36% on actual courtroom hearings. We also report on acoustic modeling experiments using “legal” transcripts for 120 hours of court hearings in a lightly supervised mode.

[1]  Jean-Luc Gauvain,et al.  Investigating lightly supervised acoustic model training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  P. Woodland,et al.  Flexible speaker adaptation using maximum likelihood linear regression , 1995 .

[3]  Richard M. Schwartz,et al.  Single-tree method for grammar-directed search , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Pascale Fung,et al.  The estimation of powerful language models from small and large corpora , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  George Zavaliagkos,et al.  Using untranscribed training data to improve performance , 1998, ICSLP.

[6]  Nguyen Thanh Long,et al.  The 1999 BBN BYBLOS 10xRT Broadcast News Transcription System , 1997 .

[7]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[8]  Daben Liu,et al.  Fast speaker change detection for broadcast news transcription and indexing , 1999, EUROSPEECH.

[9]  Daben Liu,et al.  Speech and language technologies for audio indexing and retrieval , 2000, Proceedings of the IEEE.

[10]  Richard M. Schwartz,et al.  Efficient 2-pass n-best decoder , 1997, EUROSPEECH.