In this paper, we present and investigate a new method for subband-based Automatic Speech Recognition (ASR) which approximates the ideal `full combination' approach which is itself often not practical to realize. The `full combination' approach consists of explicitly considering all possible combinations of subbands (\cite{Hermansky96:TAO}) avoiding the usually necessary independence assumption, which would limit the potential of subband-based ASR. We show how this ideal approach can be effectuated by a nonlinear combination function which constitutes the fullband posterior probabilities decomposed into a weighted sum of posterior probabilities from Artificial Neural Network (ANN) experts. This involves training of one expert for each possible subband combination. To limit such extensive training, we have found that it is possible to achieve comparable results by estimating the subband posterios for each combinationas a function of the posteriors from the individual subbands alone (\cite{Hagen98:SBS,Morris99:TFC}). The theoretical foundation of our solution to the ideal `full combination' approach with the nonlinear combination function and its approximation are presented. The weights,which represent the relative utility for recognition of each subband combination, are very important for this technique and possible schemes for their estimation will be proposed. They have been tested and compared in the framework of HMM/ANN-Hybrid systems on clean and noise-added data.
[1]
Misha Pavel,et al.
Towards ASR on partially corrupted speech
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[2]
Hervé Bourlard,et al.
Multi-Stream Speech Recognition
,
1996
.
[3]
Phil D. Green,et al.
Some solution to the missing feature problem in data classification, with application to noise robust ASR
,
1998,
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[4]
Richard P. Lippmann,et al.
ROBUST SPEECH RECOGNITION WITH INTERRUPTIONS, AND NOISE:':
,
1997
.
[5]
Hervé Bourlard,et al.
The full combination sub-bands approach to noise robust HMM/ANN based ASR
,
1999,
EUROSPEECH.
[6]
Hervé Bourlard,et al.
Subband-Based Speech Recognition in Noisy Conditions: The Full Combination Approach
,
1998
.
[7]
Hynek Hermansky,et al.
Should recognizers have ears?
,
1998,
Speech Commun..
[8]
Ronald A. Cole,et al.
New telephone speech corpora at CSLU
,
1995,
EUROSPEECH.