论文信息 - Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

In this paper, we propose the use of data-driven probabilistic utterance-level decision logic to improve Weighted Finite State Transducer (WFST)-based endpoint detection. In general, endpoint detection is dealt with using two cascaded decision processes. The first process is frame-level speech/non-speech classification based on statistical hypothesis testing, and the second process is a heuristic-knowledge-based utterance-level speech boundary decision. To handle these two processes within a unified framework, we propose a WFST-based approach. However, a WFST-based approach has the same limitations as conventional approaches in that the utterance-level decision is based on heuristic knowledge and the decision parameters are tuned sequentially. Therefore, to obtain decision knowledge from a speech corpus and optimize the parameters at the same time, we propose the use of data-driven probabilistic utterance-level decision logic. The proposed method reduces the average detection failure rate by about 14% for various noisy-speech corpora collected for an endpoint detection evaluation.

Hoon Chung | Sung Joo Lee | Yun Keun Lee

[1] Joon-Hyuk Chang,et al. Statistical model-based voice activity detection using support vector machine , 2009 .

[2] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[3] Masakiyo Fujimoto,et al. A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Yunkeun Lee,et al. Endpoint detection using weighted finite state transducer , 2013, INTERSPEECH.

[5] Javier Ramírez,et al. Statistical voice activity detection using a multiple observation likelihood ratio test , 2005, IEEE Signal Processing Letters.

[6] Dong Enqing,et al. Applying support vector machines to voice activity detection , 2002, 6th International Conference on Signal Processing, 2002..

[7] Yunkeun Lee,et al. Intra- and Inter-frame Features for Automatic Speech Recognition , 2014 .

[8] Thad Hughes,et al. Recurrent neural networks for voice activity detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Chiyoun Park,et al. Integration of sporadic noise model in POMDP-based voice activity detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Masafumi Nishimura,et al. Long-Term Spectro-Temporal and Static Harmonic Features for Voice Activity Detection , 2010, IEEE Journal of Selected Topics in Signal Processing.

[11] Fernando Pereira,et al. Weighted Automata in Text and Speech Processing , 2005, ArXiv.

[12] Johan Schalkwyk,et al. OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.