Robust Automatic Speech Recognition with Missing and Unreliable Data

Automatic speech recognition (ASR) systems have made dramatic performance leaps in the recent past. Yet, the notion that the key to making recognition more robust is to reduce the difference between training and test conditions is still commonly held. As ASR applications move from tightly controlled to more natural environments with a varying number of unpredictable sound sources, this assumption is becoming less and less viable. Decoding the speech source of interest while listening to several sound sources at the same time seems a more accurate description of the ASR process that suits these challenging environments. This thesis discusses the theoretical and practical issues which arise from this viewpoint. The aim is to explore the division of the problem of robust ASR into two subproblems: (a) identification/separation of the speech and noise using speech properties alone; and (b) recognition based on the resulting partial evidence. The basic assumption is that some regions of the speech time-frequency representation remain relatively unaffected by the noise, that they can be identified and that they alone are sufficient for ASR. In contrast to conventional techniques which require models of all sources in the auditory scene and their subsequent decoding even when only one of the sources is of interest, the techniques described in this thesis make no such requirement. However, they are flexible enough to use this information if it is available. Two techniques are used to adapt a conventional Hidden Markov model (HMM) based ASR system to use partial evidence: (i) marginalisation of the state distributions, so that only the likelihood of the reliable regions is assessed; and (ii) imputation of the unreliable regions by replacing the unreliable features with a single point from the state conditional distributions. In both cases, the ”counterevidence” assessing which states are unlikely to have generated the speech underlying the unreliable regions dominated by noise further constrains the decoding. The techniques are evaluated on the Aurora 2 connected digit recognition task, and seem to perform competitively. In the experiments, the reliable features are identified via local SNR estimates derived through stationary and adaptive on-line noise estimates. The potential of the techniques is indicated by using the clean speech to identify the reliable regions in the noisy speech, where the accuracy is maintained even at -5 dB. The simple all-or-nothing assumption (the feature is either reliable or unreliable) gives rise to a model linking the recognition and the separation as two interdependent sides of the search for the most likely explanation of the noisy data.

[1]  Phil D. Green,et al.  State based imputation of missing data for robust speech recognition and speech enhancement , 1999, EUROSPEECH.

[2]  Alexandros Potamianos,et al.  Multi-band speech recognition in noisy environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[4]  Christophe Ris,et al.  Assessing local noise level estimation methods: Application to noise robust ASR , 2000, Speech Commun..

[5]  K Aikawa,et al.  Cepstral representation of speech motivated by time-frequency masking: an application to speech recognition. , 1996, The Journal of the Acoustical Society of America.

[6]  Hervé Bourlard,et al.  Multi-Stream Speech Recognition , 1996 .

[7]  Tomohiro Nakatani,et al.  Combining Independent Component Analysis and Sound Stream Segregation , 1999 .

[8]  Barak A. Pearlmutter,et al.  Blind source separation by sparse decomposition , 2000, SPIE Defense + Commercial Sensing.

[9]  John H. L. Hansen,et al.  Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect , 1994, IEEE Trans. Speech Audio Process..

[10]  Martin Graciarena Maximum likelihood noise HMMm estimation in model-based robust speech recognition , 2000, INTERSPEECH.

[11]  Guy J. Brown,et al.  A comparison of auditory and blind separation techniques for speech segregation , 2001, IEEE Trans. Speech Audio Process..

[12]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[14]  Ben P. Milner,et al.  Improving accuracy of telephony-based, speaker-independent speech recognition , 1998, ICSLP.

[15]  Lori Lamel,et al.  DRAGON Systems Resource Management Benchmark Results February 1991 , 1991, HLT.

[16]  M. Kadirkamanathan,et al.  Simultaneous model re-estimation from contaminated data by composed hidden Markov modeling , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  H. Hermansky,et al.  On the properties of temporal processing for speech in adverse environments , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[18]  Phil D. Green,et al.  RECOGNITION OF OCCLUDED SPEECH BY HIDDEN MARKOV MODELS , 1994 .

[19]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[20]  Beth Logan,et al.  A practical perceptual frequency autoregressive HMM enhancement system , 1998, ICSLP.

[21]  Chin-Hui Lee,et al.  On stochastic feature and model compensation approaches to robust speech recognition , 1998, Speech Commun..

[22]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[23]  Shiro Ikeda,et al.  A METHOD OF ICA IN TIME-FREQUENCY DOMAIN , 2003 .

[24]  Jean-Claude Junqua,et al.  Influence of the speaking style and the noise spectral tilt on the lombard reflex and automatic speech recognition , 1998, ICSLP.

[25]  Andrew Varga,et al.  Control experiments on noise compensation in hidden Markov model based continuous word recognition , 1989, EUROSPEECH.

[26]  Reinhold Orglmeister,et al.  Blind source separation of real world signals , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[27]  Mitch Weintraub,et al.  Speech Recognition in SRI's Resource Management and ATIS Systems , 1991, HLT.

[28]  Richard M. Schwartz,et al.  BYBLOS Speech Recognition Benchmark Results , 1991, HLT.

[29]  Climent Nadeu,et al.  A comparative study of parameters and distances for noisy speech recognition , 1991, EUROSPEECH.

[30]  Hyung Soon Kim,et al.  Narrowband to wideband conversion of speech using GMM based transformation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[31]  Volker Schless,et al.  SNR-dependent flooring and noise overestimation for joint application of spectral subtraction and model combination , 1998, ICSLP.

[32]  Sridha Sridharan,et al.  Speech enhancement using critical band spectral subtraction , 1998, ICSLP.

[33]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[34]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[35]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[36]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Jen-Tzung Chien,et al.  A novel projection-based likelihood measure for noisy speech recognition , 1998, Speech Commun..

[38]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[39]  Miguel Á. Carreira-Perpiñán,et al.  Practical Identifiability of Finite Mixtures of Multivariate Bernoulli Distributions , 2000, Neural Computation.

[40]  R. M. Warren,et al.  Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits , 1995, Perception & psychophysics.

[41]  Chafic Mokbel,et al.  Towards improving ASR robustness for PSN and GSM telephone applications , 1997, Speech Commun..

[42]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[43]  Hugo Van hamme,et al.  Model-based feature enhancement for noisy speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[44]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[45]  Tadashi Kitamura,et al.  Speaker-independent spoken digit recognition in noisy environments using dynamic spectral features and neural networks , 1992, ICSLP.

[46]  Mitch Weintraub,et al.  Filterbank-energy estimation using mixture and Markov models for recognition of noisy speech , 1993, IEEE Trans. Speech Audio Process..

[47]  Hynek Hermansky,et al.  Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP) , 1991, EUROSPEECH.

[48]  Yuqing Gao,et al.  Noise reduction and speech recognition in noise conditions tested on LPNN-based continuous speech recognition system , 1993, EUROSPEECH.

[49]  Steve Young,et al.  Hidden Markov model state-based noise cancellation , 1992 .

[50]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[51]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[52]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[53]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[54]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[55]  Jean-François Cardoso,et al.  Estimating equations for source separation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  Jon Barker,et al.  Soft decisions in missing data techniques for robust automatic speech recognition , 2000, INTERSPEECH.

[57]  Mitch Weintraub,et al.  Energy conditioned spectral estimation for recognition of noisy speech , 1993, IEEE Trans. Speech Audio Process..

[58]  Chafic Mokbel,et al.  Word recognition in the car: adapting recognizers to new environments , 1992, ICSLP.

[59]  Roger K. Moore,et al.  Simultaneous recognition of concurrent speech signals using hidden Markov model decomposition , 1991, EUROSPEECH.

[60]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[61]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[62]  Abeer Alwan,et al.  Robust word recognition using threaded spectral peaks , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[63]  Phil D. Green,et al.  ROBUST ASR WITH UNRELIABLE DATA AND MINIMAL ASSUMPTIONS , 1999 .

[64]  John S. D. Mason,et al.  Noise robust estimate of speech dynamics for speaker recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[65]  B. Raj,et al.  CEPSTRAL COMPENSATION USING STATISTICAL LINEARIZATION , 2000 .

[66]  Richard M. Stern,et al.  Classifier-based mask estimation for missing feature methods of robust speech recognition , 2000, INTERSPEECH.

[67]  Biing-Hwang Juang,et al.  Filtering the time sequences of spectral parameters for speech recognition, , 1997, Speech Commun..

[68]  Steven Greenberg,et al.  Performance improvements through combining phone- and syllable-scale information in automatic speech recognition , 1998, ICSLP.

[69]  Masataka Goto,et al.  Multiagent based binaural sound stream segregation , 1998 .

[70]  Tim Haulick,et al.  Spectral noise subtraction with recursive gain curves , 1998, ICSLP.

[71]  Darryl Stewart,et al.  Robust feature selection using probabilistic union models , 2000, INTERSPEECH.

[72]  Alexander Fischer,et al.  Quantile based noise estimation for spectral subtraction and Wiener filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[73]  Frank Bretz,et al.  Comparison of Methods for the Computation of Multivariate t Probabilities , 2002 .

[74]  Imre Kiss,et al.  Multi-resolution front-end for noise robust speech recognition , 2000, INTERSPEECH.

[75]  Hynek Hermansky,et al.  Speech enhancement using linear prediction residual , 1999, Speech Commun..

[76]  Hynek Hermansky,et al.  On properties of modulation spectrum for robust automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[77]  John H. L. Hansen,et al.  Robust feature-estimation and objective quality assessment for noisy speech recognition using the Credit Card corpus , 1995, IEEE Trans. Speech Audio Process..

[78]  Saeed Vaseghi,et al.  Noise-adaptive hidden Markov models based on wiener filters , 1993, EUROSPEECH.

[79]  Børge Lindberg,et al.  Noise robust recognition using feature selective modeling , 1997, EUROSPEECH.

[80]  Bert Cranen,et al.  MISSING FEATURE THEORY IN ASR: MAKE SURE YOU MISS THE RIGHT TYPE OF FEATURES , 1999 .

[81]  Kuldip K. Paliwal,et al.  Spectral subband centroid features for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[82]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[83]  D. Ellis,et al.  SPEECH RECOGNITION AS A COMPONENT IN COMPUTATIONAL AUDITORY SCENE ANALYSIS , 2022 .

[84]  P Green,et al.  Computational auditory scene analysis: listening to several things at once. , 1993, Endeavour.

[85]  John S. Garofolo,et al.  Use of CD-ROM for speech database storage and exchange , 1989, EUROSPEECH.

[86]  Bertram E. Shi,et al.  A non-linear model transformation for ML stochastic matching in additive noise , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[87]  Oded Ghitza,et al.  Auditory nerve representation as a front-end for speech recognition in a noisy environment , 1986 .

[88]  Hsiao-Chuan Wang,et al.  Robust features for noisy speech recognition based on temporal trajectory filtering of short-time autocorrelation sequences , 1999, Speech Commun..

[89]  Richard M. Stern,et al.  Inference of missing spectrographic features for robust speech recognition , 1998, ICSLP.

[90]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[91]  Brian Mellor,et al.  Noise masking in a transform domain , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[92]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[93]  M. Picheny,et al.  Towards super-human speech recogniton , 2003 .

[94]  Te-Won Lee,et al.  Independent Component Analysis , 1998, Springer US.

[95]  Jon Barker,et al.  Decoding speech in the presence of other sound sources , 2000, INTERSPEECH.

[96]  Volker Tresp,et al.  Training Neural Networks with Deficient Data , 1993, NIPS.

[97]  Naomi Harte,et al.  Multi-resolution cepstral features for phoneme recognition across speech sub-bands , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[98]  Juha Häkkinen,et al.  Improved feature vector normalization for noise robust connected speech recognition , 1999, EUROSPEECH.

[99]  Phil D. Green,et al.  Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: an integrated study , 1999, EUROSPEECH.

[100]  A. Drygajlo,et al.  Use of Generalized Spectral Subtraction and Missing Feature Compensation for Robust Speaker Verification , 1998 .

[101]  M. Cooke,et al.  COMBINING BOTTOM-UP AND TOP-DOWN CONSTRAINTS FOR ROBUST ASR : THE MULTISOURCE DECODER , 2001 .

[102]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[103]  Jonathan G. Fiscus,et al.  1997 BROADCAST NEWS BENCHMARK TEST RESULTS: ENGLISH AND NON-ENGLISH , 1997 .

[104]  Yuqing Gao,et al.  Auditory model based speech processing , 1992, ICSLP.

[105]  Phil D. Green,et al.  Handling missing data in speech recognition , 1994, ICSLP.

[106]  Jon Barker,et al.  LINKING AUDITORY SCENE ANALYSIS AND ROBUST ASR BY MISSING DATA TECHNIQUES , 2001 .

[107]  W. Köhler Gestalt psychology , 1967 .

[108]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[109]  R. M. Warren Perceptual Restoration of Missing Speech Sounds , 1970, Science.

[110]  Guy J. Brown,et al.  Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.

[111]  Andrzej Drygajlo,et al.  Statistical estimation of unreliable features for robust speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[112]  Jaakko Astola,et al.  Speech recognition experiments in a noisy environment using auditory system modelling , 1995, EUROSPEECH.

[113]  Fei Xie,et al.  Speech enhancement by nonlinear spectral estimation - a unifying approach , 1993, EUROSPEECH.

[114]  H. Hermansky,et al.  Analysis of Speaker and Channel Variability in , 1999 .

[115]  Phil D. Green,et al.  A neural network for classification with incomplete data: application to robust ASR , 2000, INTERSPEECH.

[116]  Victor Zue,et al.  Collection and analyses of WSJ-CSR corpus at MIT , 1992, ICSLP.

[117]  Jon Barker,et al.  Modelling the recognition of spectrally reduced speech , 1997, EUROSPEECH.

[118]  Mitchel Weintraub,et al.  A theory and computational model of auditory monaural sound separation , 1985 .

[119]  Yifan Gong,et al.  Noise adaptation using linear regression for continuous noisy speech recognition , 1995, EUROSPEECH.

[120]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[121]  David Kryze,et al.  A NEW NOISE-ROBUST SUBBAND FRONT-END AND ITS COMPARISON TO P LP , 1999 .

[122]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[123]  Richard M. Stern,et al.  Signal Processing for Robust Speech Recognition , 1994, HLT.

[124]  Phil D. Green,et al.  Some solution to the missing feature problem in data classification, with application to noise robust ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[125]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[126]  Misha Pavel,et al.  On the importance of various modulation frequencies for speech recognition , 1997, EUROSPEECH.

[127]  Carmen García-Mateo,et al.  Noise model selection for robust speech recognition , 1998, ICSLP.

[128]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[129]  J Barker,et al.  The relationship between speech perception and auditory organisation : studies with spectrally reduced speech. , 1998 .

[130]  N. Sedgwick,et al.  Noise compensation for speech recognition using probabilistic models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[131]  Guy J. Brown Computational auditory scene analysis : a representational approach , 1993 .

[132]  Yunxin Zhao,et al.  Robust speech recognition using discriminative stream weighting and parameter interpolation , 1998, ICSLP.

[133]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[134]  Daniel P. W. Ellis,et al.  Connectionist speech recognition of Broadcast News , 2002, Speech Commun..

[135]  Kari Torkkola,et al.  Blind separation of delayed sources based on information maximization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[136]  Naoto Iwahashi,et al.  Stochastic features for noise robust speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[137]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[138]  Mark J. F. Gales,et al.  An improved approach to the hidden Markov model decomposition of speech and noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[139]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[140]  Dirk Van Compernolle Noise adaptation in a hidden Markov model speech recognition system , 1989 .

[141]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[142]  Christopher Kermorvant A comparison of noise reduction techniques for robust speech recognition , 1999 .

[143]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[144]  Volker Tresp,et al.  Efficient Methods for Dealing with Missing Data in Supervised Learning , 1994, NIPS.

[145]  Stéphane Dupont Missing data reconstruction for robust automatic speech recognition in the framework of hybrid HMM/ANN systems , 1998, ICSLP.

[146]  Andrzej Drygajlo,et al.  Spectral subtraction and missing feature modeling for speaker verification , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[147]  Richard M. Stern,et al.  COMPENSATION FOR ENVIRONMENTAL DEGRADATION IN AUTOMATIC SPEECH RECOGNITION , 1999 .

[148]  Juan Arturo Nolazco-Flores,et al.  Adapting a HMM-based recogniser for noisy speech enhanced by spectral subtraction , 1993, EUROSPEECH.

[149]  Saeed Vaseghi,et al.  Speech recognition in noisy environments , 1992, ICSLP.

[150]  Satoshi Nakamura,et al.  Robust word spotting in adverse car environments , 1993, EUROSPEECH.

[151]  J. Makhoul,et al.  The voice of the computer is heard in the land (and it listens too!) [speech recognition] , 1997 .

[152]  Stuart Cunningham,et al.  THE ROLE OF EVIDENCE AND COUNTER-EVIDENCEIN SPEECH PERCEPTION , 1999 .

[153]  Miguel Á. Carreira-Perpiñán,et al.  Mode-Finding for Mixtures of Gaussian Distributions , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[154]  P. Renevey Speech recognition in noisy conditions using missing feature approach , 2000 .

[155]  Takao Kobayashi,et al.  Generalized cepstral modeling of speech degraded by additive noise , 1993, EUROSPEECH.

[156]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[157]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[158]  Kunio Nakajima,et al.  Optimal discriminative training for HMMs to recognize noisy speech , 1992, ICSLP.

[159]  Hideki Kawahara,et al.  Missing-data model of vowel identification. , 1999, The Journal of the Acoustical Society of America.

[160]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[161]  Guy J. Brown,et al.  A blackboard architecture for computational auditory scene analysis , 1999, Speech Commun..

[162]  Alex Acero,et al.  Speech/noise separation using two microphones and a VQ model of speech signals , 2000, INTERSPEECH.

[163]  Phil D. Green,et al.  Auditory scene analysis and hidden Markov model recognition of speech in noise , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[164]  Chin-Hui Lee,et al.  A minimax classification approach with application to robust speech recognition , 1993, IEEE Trans. Speech Audio Process..

[165]  Steve Love,et al.  Improving the noise and spectral robustness of an isolated-word recognizer using an auditory-model front end , 1998, ICSLP.

[166]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[167]  Alan,et al.  Comparison of Methods for the Computationof Multivariate Normal Probabilities , 1993 .

[168]  Andrzej Drygajlo,et al.  Speaker verification in noisy environments with combined spectral subtraction and missing feature theory , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[169]  L. Lewin,et al.  Dilogarithms and associated functions , 1958 .

[170]  F. Perdigão,et al.  AUDITORY MODELS AS FRONT-ENDS FOR SPEECH RECOGNITION , 1998 .

[171]  Jilei Tian,et al.  Noise robust two-stream auditory feature extraction method for speech recognition , 1998, ICSLP.

[172]  Mervyn A. Jack,et al.  Improving performance of spectral subtraction in speech recognition using a model for additive noise , 1998, IEEE Trans. Speech Audio Process..

[173]  Andrzej Drygajlo,et al.  Robust speech recognition in noise using speech enhancement based on masking properties of the auditory system and adaptive HMM , 1995, EUROSPEECH.

[174]  P. Haavisto,et al.  Noise compensation for speech recognition in car noise environments , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[175]  Nathalie Virag Speech enhancement based on masking properties of the auditory system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[176]  Mounir El-Maliki Speaker verification with missing features in noisy environments , 2000 .

[177]  Roy D. Patterson,et al.  The auditory image model as a preprocessor for spoken language , 1994, ICSLP.

[178]  Hervé Bourlard,et al.  Subband-Based Speech Recognition in Noisy Conditions: The Full Combination Approach , 1998 .

[179]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[180]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[181]  Hynek Hermansky,et al.  Multi-band and adaptation approaches to robust speech recognition , 1997, EUROSPEECH.

[182]  S. H. Leung,et al.  Noisy speech recognition using singular value decomposition and two-sided linear prediction , 1993, EUROSPEECH.

[183]  Beth Logan,et al.  Adaptive model-based speech enhancement , 2001, Speech Commun..

[184]  Harald Eckhardt,et al.  Combination of distortion-robust feature extraction and neural noise reduction for ASR , 1993, EUROSPEECH.

[185]  Saeed Vaseghi,et al.  Noise compensation methods for hidden Markov model speech recognition in adverse environments , 1997, IEEE Trans. Speech Audio Process..

[186]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[187]  Roger K. Moore,et al.  Noise compensation algorithms for use with hidden Markov model based speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[188]  Volker Tresp,et al.  Some Solutions to the Missing Feature Problem in Vision , 1992, NIPS.

[189]  Ted H. Applebaum,et al.  Features for noise-robust speaker-independent word recognition , 1990, ICSLP.

[190]  Chong Kwan Un,et al.  Speech recognition in noisy environments using first-order vector Taylor series , 1998, Speech Commun..

[191]  Hervé Glotin,et al.  Blind separation of delayed and superimposed acoustic sources : learning algorithms an experimental study , 1999 .

[192]  Francis Jack Smith,et al.  A probabilistic union model for sub-band based robust speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[193]  Andrzej Drygajlo,et al.  Introduction of a reliability measure in missing data approach for robust speech recognition , 2000, 2000 10th European Signal Processing Conference.

[194]  Petri Haavisto,et al.  Dynamic parameter compensation for speech recognition in noise , 1995, EUROSPEECH.

[195]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[196]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[197]  R. M. Warren,et al.  Auditory induction: Reciprocal changes in alternating sounds , 1994, Perception & psychophysics.

[198]  Nikki Mirghafori,et al.  Transmissions and transitions: a study of two common assumptions in multi-band ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[199]  Lou Boves,et al.  Acoustic backing-off in the local distance computation for robust automatic speech recognition , 1998, ICSLP.

[200]  Guy J. Brown,et al.  A neural oscillator sound separator for missing data speech recognition , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[201]  P. Renevey,et al.  Missing feature theory and parallel model combination for robust speech recognition , 1999 .

[202]  Jean-François Mari,et al.  A recombination model for multi-band speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[203]  Ruhi Sarikaya,et al.  Subband based classification of speech under stress , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[204]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[205]  A. Genz Numerical Computation of Multivariate Normal Probabilities , 1992 .

[206]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[207]  Ronald A. Cole,et al.  A telephone speech database of spelled and spoken names , 1992, ICSLP.

[208]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[209]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[210]  D. van Compernolle Spectral estimation using a log-distance error criterion applied to speech recognition , 1989, ICASSP.

[211]  Richard Lippmann,et al.  Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise KN-37 , 1997, EUROSPEECH.

[212]  Hervé Bourlard,et al.  Using multiple time scales in a multi-stream speech recognition system , 1997, EUROSPEECH.

[213]  Hervé Glotin,et al.  Interfacing of CASA and partial recognition based on a multistream technique , 1998, ICSLP.

[214]  Hervé Bourlard,et al.  From Multi-Band Full Combination to Multi-Stream Full Combination Processing in Robust ASR , 2000 .

[215]  Chin-Hui Lee,et al.  Robust speech recognition based on stochastic matching , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[216]  Hans-Günter Hirsch,et al.  Improved speech recognition using high-pass filtering of subband envelopes , 1991, EUROSPEECH.

[217]  Satoshi Nakamura,et al.  Speech recognition for a distant moving speaker based on HMM composition and separation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[218]  S.D. Peters,et al.  On the limits of speech recognition in noise , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[219]  Claudio Vair,et al.  Data-driven PMC and Bayesian learning integration for fast model adaptation in noisy conditions , 1998, ICSLP.

[220]  D. C. Bateman,et al.  Spectral contrast normalization and other techniques for speech recognition in noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[221]  David F. Rosenthal,et al.  Computational auditory scene analysis , 1998 .

[222]  Rainer Martin,et al.  An efficient algorithm to estimate the instantaneous SNR of speech signals , 1993, EUROSPEECH.

[223]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..