Spectral Entropy Feature in Multi-stream for Robust ASR

In recent papers, entropy computed from sub-bands of the spectrum was used as a feature for automatic speech recognition. In the present paper, we further study the sub-band spectral entropy features which can give the flatness/peakiness of the sub-band spectrum and in turn the position of the formants in the spectrum. The sub-band spectral entropy features are used in hybrid hidden Markov model/artificial neural network systems and are found to be noise robust. The spectral entropy features are investigated along with PLP features in multi-stream combination. Separate multi-layer perceptrons (MLPs) are trained for PLP features, spectral entropy features and both the features concatenated. The output posteriors of the three MLPs are combined after weighting such that the weight to a particular MLP's outputs are inversely proportional to the entropy of the output posterior distributions of that MLP. In Tandem framework, the combined output, after decorrelation, is fed to standard hidden Markov model/Gaussian mixture model system. Significant improvement in performance is reported when spectral entropy features are used along with PLP features in multi-stream combination.

[1]  Hervé Bourlard,et al.  Multi-resolution spectral entropy feature for robust ASR , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[3]  Mukund Padmanabhan Spectral peak tracking and its use in speech recognition , 2000, INTERSPEECH.

[4]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[5]  Hynek Hermansky,et al.  Entropy based combination of tandem representations for noise robust ASR , 2004, INTERSPEECH.

[6]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.

[7]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Daniel P. W. Ellis,et al.  Investigations into tandem acoustic modeling for the Aurora task , 2001, INTERSPEECH.

[9]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Andrew C. Morris,et al.  Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR , 2005, Comput. Speech Lang..

[11]  Hervé Bourlard,et al.  Spectro-temporal activity pattern (STAP) features for noise robust ASR , 2004, INTERSPEECH.

[12]  Ronald A. Cole,et al.  New telephone speech corpora at CSLU , 1995, EUROSPEECH.

[13]  Katsuhiko Shirai,et al.  A recombination strategy for multi-band speech recognition based on mutual information criterion , 1999, EUROSPEECH.

[14]  Hervé Glotin,et al.  Multi-stream adaptive evidence combination for noise robust ASR , 2001, Speech Commun..

[15]  Hervé Bourlard,et al.  New entropy based combination rules in HMM/ANN multi-stream ASR , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[16]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[17]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[18]  Hervé Bourlard,et al.  An introduction to the hybrid hmm/connectionist approach , 1995 .