论文信息 - English spoken term detection in multilingual recordings

English spoken term detection in multilingual recordings

This paper investigates the automatic detection of English spoken terms in a multi-language scenario over real lecture recordings. Spoken Term Detection (STD) is based on an LVCSR where the output is represented in the form of word lattices. The lattices are then used to search the required terms. Processed lectures are mainly composed of English, French and Italian recordings where the language can also change within one recording. Therefore, the English STD system uses an Out-Of-Language (OOL) detection module to filter out non-English input segments. OOL detection is evaluated w.r.t. various confidence measures estimated from word lattices. Experimental studies of OOL detection followed by English STD are performed on several hours of multilingual recordings. Significant improvement of OOL+STD over a stand-alone STD system is achieved (relatively more than 50% in EER). Finally, an additional modality (text slides in the form of PowerPoint presentations) is exploited to improve STD.

Fabio Valente | Petr Motlícek | Philip N. Garner | P. Motlícek | F. Valente

[1] John H. L. Hansen,et al. Dialect distance assessment method based on comparison of pitch pattern statistical models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2] Andreas Stolcke,et al. The SRI/OGI 2006 spoken term detection system , 2007, INTERSPEECH.

[3] Jithendra Vepa,et al. The segmentation of multi-channel meeting recordings for automatic speech recognition , 2006, INTERSPEECH.

[4] Petr Motlícek. Automatic out-of-language detection based on confidence measures derived from LVCSR word and phone lattices , 2009, INTERSPEECH.

[5] Hermann Ney,et al. Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[6] Alvin F. Martin,et al. The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[7] Lukás Burget,et al. The AMI System for the Transcription of Speech in Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.