Speech Transcript Evaluation for Information Retrieval

Speech recognition transcripts are being used in various fields of research and practical applications, putting various demands on their accuracy. Traditionally ASR research has used intrinsic evaluation measures such as word error rate to determine transcript quality. In non-dictation-type applications such as speech retrieval, it is better to use extrinsic (or task specific) measures. Indexation and the associated processing may eliminate certain errors, whereas the search query may reveal others. In this work, we argue that the standard extrinsic speech retrieval measure average precision is unpractical for ASR evaluation. As an alternative we propose the use of ranked correlation measures on the output of the speech retrieval task, with the goal of predicting relative mean average precision. The measures we used showed a reasonably high correlation with average precision, but require much less human effort to calculate and can be more easily deployed in a variety of real-life settings.

[1]  Ellen M. Voorhees,et al.  1998 TREC-7 Spoken Document Retrieval Track Overview and Results , 1998 .

[2]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[3]  George Doddington The Topic Detection and Tracking Phase 2 (TDT2) evaluation plan , 1998 .

[4]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[5]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[6]  Hermann Ney,et al.  Probabilistic Aspects in Spoken Document Retrieval , 2003, EURASIP J. Adv. Signal Process..

[7]  Shumeet Baluja,et al.  A large scale study of wireless search behavior: Google mobile search , 2006, CHI.

[8]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[9]  Willemijn Heeren,et al.  Evaluating ASR Output for Information Retrieval , 2007, SIGIR 2007.

[10]  M. Kendall,et al.  Rank Correlation Methods , 1949 .

[11]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[12]  Laurens van der Werff Story segmentation for speech transcripts in sparse data conditions , 2010, SSCS '10.

[13]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[14]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[15]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[16]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[17]  D. Blest Theory & Methods: Rank Correlation — an Alternative Measure , 2000 .

[18]  Mark Liberman,et al.  THE TDT-2 TEXT AND SPEECH CORPUS , 1999 .