论文信息 - Switching linear dynamic transducer for stereo data based speech feature mapping

Switching linear dynamic transducer for stereo data based speech feature mapping

The performance of a speech recognition system may be degraded even without any background noise because of the linear or non-linear distortions incurred by recording devices or reverberations. One of the well-known approaches to reduce this channel distortion is feature mapping which maps the distorted speech feature to its clean counterpart. The feature mapping rule is usually trained based on a set of stereo data which consists of the simultaneous recordings obtained in both the reference and target conditions. In this paper, we propose a novel approach to speech feature sequence mapping based on the switching linear dynamic transducer (SLDT). The proposed algorithm enables us a sequence-to-sequence mapping in a systematic way, instead of the traditional vector-to-vector mapping. The proposed approach is applied to compensate channel distortion in speech recognition and shows improvement in recognition performance.

[1] Matthias Wölfel,et al. Enhanced Speech Features by Single-Channel Joint Compensation of Noise and Reverberation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Jont B. Allen,et al. Image method for efficiently simulating small‐room acoustics , 1976 .

[3] Li Deng,et al. High-performance robust speech recognition using stereo training data , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4] Nam Soo Kim and Joon-Hyuk Chang. Statistical Model based Techniques for Robust Speech Communication , 2011 .

[5] Nam Soo Kim. Statistical linear approximation for environment compensation , 1998, IEEE Signal Processing Letters.

[6] Richard M. Stern,et al. Feature compensation based on switching linear dynamic model , 2005, IEEE Signal Processing Letters.

[7] Li Deng,et al. Evaluation of the SPLICE algorithm on the Aurora2 database , 2001, INTERSPEECH.

[8] Alexander Wong,et al. KPAC: A Kernel-Based Parametric Active Contour Method for Fast Image Segmentation , 2010, IEEE Signal Processing Letters.

[9] Oscar Saz-Torralba,et al. Unsupervised Data-Driven Feature Vector Normalization With Acoustic Model Adaptation for Robust Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10] Mari Ostendorf,et al. ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..