Combining Chinese spoken term detection systems via side-information conditioned linear logistic regression

This paper examines the task of Spoken Term Detection (STD) for the Chinese language. We propose to use Linear Logistic Regression (LLR) to combine various Chinese STD systems built with different decoding units, detection units, features and phone sets. In order to solve the missing-sample problem in STD system combination, side-information reflecting the reliability of the scores for fusion is used to condition the parameters of the standard LLR model. In addition, a two-stage combination solution is proposed to overcome the data-sparse problem. The experimental results show that the proposed methods improve the overall detection performance significantly. Compared with the best single system, a relative 11.3% improvement is achieved.

[1]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Elizabeth Shriberg,et al.  System combination using auxiliary information for speaker verification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[4]  Jia Liu,et al.  A study of lattice-based spoken term detection for Chinese spontaneous speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  Dong Wang,et al.  A posterior probability-based system hybridisation and combination for spoken term detection , 2009, INTERSPEECH.

[6]  Peng Yu,et al.  A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech , 2004, INTERSPEECH.

[7]  Berlin Chen,et al.  Voice retrieval of Mandarin broadcast news speech , 2006, Int. J. Pattern Recognit. Artif. Intell..

[8]  Yu Shi,et al.  Segmental tonal modeling for phone set design in Mandarin LVCSR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Pak-Chung Ching,et al.  Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion , 2003, TALIP.