论文信息 - Handling overlaps in spoken term detection

Handling overlaps in spoken term detection

Spoken term detection (STD) systems usually arrive at many overlapping detections which are often addressed with some pragmatic approaches, e.g. choosing the best detection to represent all the overlaps. In this paper we present a theoretical study based on a concept of acceptance space. In particular, we present two confidence estimation approaches based on Bayesian and evidence perspectives respectively. Analysis shows that both approaches possess respective advantages and shortcomings, and that their combination has the potential to provide an improved confidence estimation. Experiments conducted on meeting data confirm our analysis and show considerable performance improvement with the combined approach, in particular for out-of-vocabulary spoken term detection with stochastic pronunciation modeling.

[1] Simon King,et al. Stochastic Pronunciation Modeling for Out-of-Vocabulary Spoken Term Detection , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Lukás Burget,et al. The AMI Meeting Transcription System: Progress and Performance , 2006, MLMI.

[3] J. C. Speech. Hybrid word-subword decoding for spoken term detection , 2008 .

[4] Bhuvana Ramabhadran,et al. Phonetic query expansion for spoken document retrieval , 2008, INTERSPEECH.

[5] Bhuvana Ramabhadran,et al. Effect of pronunciations on OOV queries in spoken term detection , 2009 .

[6] Simon King,et al. Term-dependent confidence for out-of-vocabulary term detection , 2009, INTERSPEECH.

[7] Herbert Gish,et al. Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[8] Bhuvana Ramabhadran,et al. Effect of pronounciations on OOV queries in spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Andreas Stolcke,et al. Open-vocabulary spoken term detection using graphone-based hybrid recognition systems , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Andreas Stolcke,et al. The SRI/OGI 2006 spoken term detection system , 2007, INTERSPEECH.