Confidence measure for speech indexing based on Latent Dirichlet Allocation

This paper presents a confidence measure for speech indexing that aims to predict the indexing quality of a speech document for a Spoken Document Retrieval (SDR) task. We first introduce how the indexing quality of a speech document is evaluated. Then, we present our method to predict the indexing quality of a speech document. It is based on confidence measure provided by an automatic speech recognition system and the detection of semantic outliers implemented with the Latent Dirichlet Allocation (LDA) model. Experiments are conducted on the French Broadcast news campaign ESTER2 in a classical SDR scenario where users submit text-queries to a search engine. Results demonstrate an overall improvement when the detection is done with the LDA model. The detection rate is always above 70%. Index Terms: speech indexing, confidence measure, spoken document retrieval, latent dirichlet allocation

[1]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[2]  Georges Linarès,et al.  The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.

[3]  François Yvon,et al.  Using LDA to detect semantically incoherent documents , 2008, CoNLL.

[4]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[5]  Georges Linarès,et al.  A segment-level confidence measure for Spoken Document Retrieval , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Aaron E. Rosenberg,et al.  SCANMail: a voicemail interface that makes speech browsable, readable and searchable , 2002, CHI.

[7]  Timothy J. Hazen,et al.  Retrieval and browsing of spoken content , 2008, IEEE Signal Processing Magazine.

[8]  Georges Linarès,et al.  Combined low level and high level features for out-of-vocabulary word detection , 2009, INTERSPEECH.

[9]  Bhuvana Ramabhadran,et al.  Building an information retrieval test collection for spontaneous conversational speech , 2004, SIGIR '04.

[10]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[11]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Gokhan Tur,et al.  LDA Based Similarity Modeling for Question Answering , 2010, HLT-NAACL 2010.