Impact of variabilities on speech recognition

Major progress is being recorded regularly on both the technology and exploitation of Automatic Speech Recognition (ASR) and spoken language systems. However, there are still technological barriers to flexible solutions and user satisfaction under some circumstances. This is related to several factors, such as the sensitivity to the environment (background noise or channel variability), or the weak representation of grammatical and semantic knowledge. Current research is also emphasizing deficiencies in dealing with variation naturally present in speech. For instance, the lack of robustness to foreign accents precludes the use by specific populations. There are actually many factors affecting the speech realization: regional, sociolinguistic, or related to the environment or the speaker itself. These create a wide range of variations that may not be modeled correctly (speaker, gender, speech rate, vocal effort, regional accents, speaking style, non stationarity...), especially when resources for system training are scarce. This paper outlines some current advances related to variabilities in ASR.

[1]  Arnaud Martin,et al.  Voicing parameter and energy based speech/non-speech detection for speech recognition in adverse conditions , 2003, INTERSPEECH.

[2]  R. Cole,et al.  THE OGI KIDS’ SPEECH CORPUS AND RECOGNIZERS , 2000 .

[3]  John Makhoul,et al.  Comparative experiments on large vocabulary speech recognition , 1993 .

[4]  C. Neti,et al.  Phone-context specific gender-dependent acoustic-models for continuous speech recognition , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[5]  Yochai Konig,et al.  GDNN: a gender-dependent neural network for continuous speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[6]  Y. Patel,et al.  An integrated multi-dialect speech recognition system with optional speaker adaptation , 1995, EUROSPEECH.

[7]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[9]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  Leon Cohen,et al.  Scale transform in speech analysis , 1999, IEEE Trans. Speech Audio Process..

[11]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[12]  Xiuyang Yu,et al.  What kind of pronunciation variation is hard for triphones to model? , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  Sean Doyle,et al.  The effect of fundamental frequency on Mandarin speech recognition , 1998, ICSLP.

[14]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Cheng Wu,et al.  Towards robust speech recognition in the telephony network environment - cellular and landline conditions , 1999, EUROSPEECH.

[16]  Hermann Ney,et al.  Comparison of discriminative training criteria and optimization methods for speech recognition , 2001, Speech Commun..

[17]  Dirk Van Compernolle,et al.  Speaker clustering for dialectic robustness in speaker independent recognition , 1991, EUROSPEECH.

[18]  Y. Ephraim Statistical model-based speech enhancement systems , 1988 .

[19]  Bhuvana Ramabhadran,et al.  Automatic recognition of spontaneous speech for access to multilingual oral history archives , 2004, IEEE Transactions on Speech and Audio Processing.

[20]  Frank Seide,et al.  Pitch tracking and tone features for Mandarin speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[21]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[22]  William J. Byrne,et al.  Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[23]  Yianni Attikiouzel,et al.  Speaker-independent isolated word recognition using multiple hidden Markov models , 1994 .

[24]  Satoshi Nakamura,et al.  Hybrid HMM/BN LVCSR system integrating multiple acoustic features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[26]  Hermann Ney,et al.  Robust speech recognition using a voiced-unvoiced feature , 2002, INTERSPEECH.

[27]  Hermann Ney,et al.  Speaker adaptive modeling by vocal tract normalization , 2002, IEEE Trans. Speech Audio Process..

[28]  Bishnu S. Atal,et al.  Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[29]  Hervé Bourlard,et al.  Subband-based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Irina Illina,et al.  Hidden factor dynamic Bayesian networks for speech recognition , 2004, INTERSPEECH.

[31]  Ramdas Kumaresan An inverse signal approach to computing the envelope of a real valued signal , 1998, IEEE Signal Process. Lett..

[32]  Hynek Hermansky,et al.  Analysis of sources of variability in speech , 1999, EUROSPEECH.

[33]  Tanee Demeechai,et al.  Recognition of syllables in a tone language , 2001, Speech Commun..

[34]  Julia Hirschberg,et al.  Prosodic and other cues to speech recognition failures , 2004, Speech Commun..

[35]  Les E. Atlas,et al.  Coherent envelope detection for modulation filtering of speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[36]  Thilo Pfau,et al.  Creating hidden Markov models for fast speech , 1998, ICSLP.

[37]  Mark J. F. Gales Cluster adaptive training for speech recognition , 1998, ICSLP.

[38]  Helmer Strik,et al.  Modeling pronunciation variation for ASR: A survey of the literature , 1999, Speech Commun..

[39]  Hervé Bourlard,et al.  On Variable-Scale Piecewise Stationary Spect ASR , 2005 .

[40]  Sadaoki Furui,et al.  Spontaneous speech recognition using a massively parallel decoder , 2004, INTERSPEECH.

[41]  Mark A. Clements,et al.  Speech recognition in noise using a projection-based likelihood measure for mixture density HMM's , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42]  G AndreouAndreas,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998 .

[43]  Eric Fosler-Lussier,et al.  Effects of speaking rate and word frequency on pronunciations in convertional speech , 1999, Speech Commun..

[44]  Abeer Alwan,et al.  AM-demodulation of speech spectra and its application io noise robust speech recognition , 2000, INTERSPEECH.

[45]  Hervé Bourlard,et al.  CDNN: a context dependent neural network for continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  Haiping Li,et al.  Recognize tone languages using pitch information on the main vowel of each syllable , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[47]  Richard M. Schwartz,et al.  Adaptation to new microphones using tied-mixture normalization , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[49]  J. C. Steinberg,et al.  Toward the Specification of Speech , 1950 .

[50]  Ellen Eide Distinctive features for use in an automatic speech recognition system , 2001, INTERSPEECH.

[51]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[52]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[53]  Régine André-Obrecht,et al.  A new statistical approach for the automatic segmentation of continuous speech signals , 1988, IEEE Trans. Acoust. Speech Signal Process..

[54]  Yonghong Yan,et al.  Speaker adaptation using constrained transformation , 2004, IEEE Transactions on Speech and Audio Processing.

[55]  David L. Thomson,et al.  Use of voicing features in HMM-based speech recognition , 2002, Speech Commun..

[56]  Satoshi Takahashi,et al.  Robust speech recognition based on HMM composition and modified wiener filter , 2004, INTERSPEECH.

[57]  Simon Haykin,et al.  Communication Systems , 1978 .

[58]  Ulla Uebler,et al.  Multilingual speech recognition in seven languages , 2001, Speech Commun..

[59]  Andreas Stolcke,et al.  Trapping conversational speech: extending TRAP/tandem approaches to conversational telephone speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[60]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[61]  Hynek Hermansky,et al.  Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Pietro Laface,et al.  Channel adaptation for a continuous speech recognizer , 1992, ICSLP.

[63]  Steven Greenberg,et al.  LINGUISTIC DISSECTION OF SWITCHBOARD-CORPUS AUTOMATIC SPEECH RECOGNITION SYSTEMS , 2000 .

[64]  Runsheng Liu,et al.  Discriminative HMM stream model for Mandarin digit string speech recognition , 2002, 6th International Conference on Signal Processing, 2002..

[65]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[66]  Harry Hollien,et al.  The Phonetic Bases of Speaker Recognition by Francis Nolan , 1985 .

[67]  Mark J. F. Gales,et al.  Cepstral parameter compensation for HMM recognition in noise , 1993, Speech Commun..

[68]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[69]  Lori Lamel,et al.  Investigating syllabic structures and their variation in spontaneous French , 2005, Speech Commun..

[70]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[71]  Mukund Padmanabhan,et al.  Maximum-likelihood nonlinear transformation for acoustic adaptation , 2004, IEEE Transactions on Speech and Audio Processing.

[72]  Takehito Utsuro,et al.  A confidence measure based on agreement among multiple LVCSR models - correlation between pair of acoustic models and confidence , 2002, INTERSPEECH.

[73]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[74]  Maria-Gabriella Di Benedetto,et al.  Extrinsic normalization of vowel formant values based on cardinal vowels mapping , 1992, ICSLP.

[75]  Yu Tsao,et al.  Segmental eigenvoice with delicate eigenspace for improved speaker adaptation , 2005, IEEE Transactions on Speech and Audio Processing.

[76]  Martin Westphal,et al.  The use of cepstral means in conversational speech recognition , 1997, EUROSPEECH.

[77]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition , 1996 .

[78]  Puming Zhan,et al.  Speaker normalization based on frequency warping , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[79]  Vassilios Digalakis,et al.  Combination of machine scores for automatic grading of pronunciation quality , 2000, Speech Commun..

[80]  Daniel P. W. Ellis,et al.  Frequency-domain linear prediction for temporal features , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[81]  Mark J. F. Gales Acoustic factorisation , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[82]  Daniel P. W. Ellis,et al.  Tandem acoustic modeling in large-vocabulary recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[83]  Kuldip K. Paliwal,et al.  An improved sub-word based speech recognizer , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[84]  Dirk Van Compernolle Recognizing speech of goats, wolves, sheep and ... non-natives , 2001, Speech Commun..

[85]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[86]  Roger Hsiao,et al.  Improving eigenspace-based MLLR adaptation by kernel PCA , 2004, INTERSPEECH.

[87]  T. M. Nearey Phonetic feature systems for vowels , 1978 .

[88]  Brendan J. Frey,et al.  A Segmental HMM for Speech Waveforms , 2004 .

[89]  Hermann Ney,et al.  Acoustic feature combination for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[90]  Michael Picheny,et al.  New methods in continuous Mandarin speech recognition , 1997, EUROSPEECH.

[91]  Philip C. Woodland,et al.  Using accent-specific pronunciation modelling for improved large vocabulary continuous speech recognition , 1997, EUROSPEECH.

[92]  Andrej Ljolje Speech recognition using fundamental frequency and voicing in acoustic modeling , 2002, INTERSPEECH.

[93]  Daniel Elenius,et al.  Comparing speech recognition for adults and children , 2004 .

[94]  David Gelbart,et al.  Improving word accuracy with Gabor feature extraction , 2002, INTERSPEECH.

[95]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[96]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[97]  Heiga Zen,et al.  Speech recognition using voice-characteristic-dependent acoustic models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[98]  Samy Bengio,et al.  Automatic speech recognition using dynamic bayesian networks with both acoustic and articulatory variables , 2000, INTERSPEECH.

[99]  Li Deng,et al.  Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition , 2003, IEEE Trans. Speech Audio Process..

[100]  Philip C. Woodland,et al.  Using accent-specific pronunciation modelling for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[101]  Michael Picheny,et al.  Improvements in children's speech recognition performance , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[102]  A. Mertins,et al.  Vocal tract length invariant features for automatic speech recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[103]  Andreas Stolcke,et al.  A study of multilingual speech recognition , 1997, EUROSPEECH.

[104]  Douglas D. O'Shaughnessy,et al.  Towards a robust/fast continuous speech recognition system using a voiced-unvoiced decision , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[105]  Renato De Mori,et al.  A family of parallel hidden Markov models , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[106]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[107]  R. Plomp,et al.  Perceptual and physical space of vowel sounds. , 1969, The Journal of the Acoustical Society of America.

[108]  Martin J. Russell,et al.  Recognition of read and spontaneous children's speech using two new corpora , 2004, INTERSPEECH.

[109]  Mukund Padmanabhan,et al.  Maximizing information content in feature extraction , 2005, IEEE Transactions on Speech and Audio Processing.

[110]  Mark J. F. Gales,et al.  An improved approach to the hidden Markov model decomposition of speech and noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[111]  Renato De Mori,et al.  Variability of automatic speech recognition systems using different features , 2005, INTERSPEECH.

[112]  Pascale Fung,et al.  MLLR-based accent model adaptation without accented data , 2000, INTERSPEECH.

[113]  Li Deng,et al.  Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion , 2005, IEEE Transactions on Speech and Audio Processing.

[114]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[115]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[116]  Spyridon Matsoukas,et al.  Minimum phoneme error based heteroscedastic linear discriminant analysis for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[117]  Andreas Stolcke,et al.  On using MLP features in LVCSR , 2004, INTERSPEECH.

[118]  Hervé Bourlard,et al.  Speech recognition with auxiliary information , 2004, IEEE Transactions on Speech and Audio Processing.

[119]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[120]  Diego Giuliani,et al.  Investigating recognition of children's speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[121]  Douglas B. Paul Extensions to phone-state decision-tree clustering: single tree and tagged clustering , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[122]  Ben P. Milner,et al.  Inclusion of temporal information into features for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[123]  Hyung Soon Kim,et al.  Speech recognition in car noise environments using multiple models according to noise masking levels , 1998, ICSLP.

[124]  Torbjørn Svendsen,et al.  On the automatic segmentation of speech signals , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[125]  M. Eskénazi KIDS: A database of children’s speech , 1996 .

[126]  Biing-Hwang Juang,et al.  Generalized mixture of HMMs for continuous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[127]  Hsiao-Chuan Wang,et al.  Hidden Markov model for Mandarin lexical tone recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[128]  Yunxin Zhao,et al.  Speaker normalization using constrained spectra shifts in auditory filter domain , 1993, EUROSPEECH.

[129]  Biing-Hwang Juang,et al.  HMM clustering for connected word recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[130]  Jay G. Wilpon,et al.  A study of speech recognition for children and the elderly , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[131]  LoogMarco,et al.  Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA , 2004 .

[132]  Li Deng,et al.  Uncertainty decoding with SPLICE for noise robust speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[133]  Man-Hung Siu,et al.  Decision tree based tone modeling for Chinese speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[134]  Ken Chen,et al.  An evaluation of using mutual information for selection of acoustic-features representation of phonemes for speech recognition , 2002, INTERSPEECH.

[135]  Roger K. Moore,et al.  Modelling asynchrony in speech using elementary single-signal decomposition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[136]  Katarina Bartkova,et al.  Language based phone model combination for ASR adaptation to foreign accent , 1999 .

[137]  Mark J. F. Gales,et al.  Multiple-cluster adaptive training schemes , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[138]  Satoshi Nakamura,et al.  Introduction to the Special Issue on Spontaneous Speech Processing , 2004, IEEE Trans. Speech Audio Process..

[139]  Shrikanth S. Narayanan,et al.  Analysis of children's speech: duration, pitch and formants , 1997, EUROSPEECH.

[140]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[141]  Harald Singer,et al.  Pitch dependent phone modelling for HMM-based speech recognition , 1994 .

[142]  Mark Hasegawa-Johnson,et al.  Maximum mutual information based acoustic-features representation of phonological features for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[143]  Andreas Stolcke,et al.  Voicing feature integration in SRI's decipher LVCSR system , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[144]  Forbes Ave. Pittsburgh,et al.  PINPOINTING PRONUNCIATION ERRORS IN CHILDREN ’ S SPEECH : EXAMINING THE ROLE OF THE SPEECH RECOGNIZER , 2000 .

[145]  Mei-Yuh Hwang,et al.  Improvements on speech recognition for fast talkers , 1999, EUROSPEECH.

[146]  Chafic Mokbel,et al.  Towards improving ASR robustness for PSN and GSM telephone applications , 1997, Speech Commun..

[147]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[148]  Sadaoki Furui,et al.  Hidden mode HMM using Bayesian network for modeling speaking rate fluctuation , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[149]  Christian Wellekens,et al.  Least squares filtering of speech signals for robust ASR , 2006, Speech Commun..

[150]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[151]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[152]  Mats Blomberg Collection and recognition of children s speech in the PF-Star project , 2003 .

[153]  Hynek Hermansky,et al.  TRAPS - classifiers of temporal patterns , 1998, ICSLP.

[154]  Mark J. F. Gales Semi-tied covariance matrices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[155]  Eric Fosler-Lussier,et al.  Towards robustness to fast speech in ASR , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[156]  David L. Thomson,et al.  Use of periodicity and jitter as speech recognition features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[157]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[158]  E. A. Martin,et al.  Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[159]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[160]  Tao Chen,et al.  Analysis of Speaker Variability , 2022 .

[161]  Brendan J. Frey,et al.  Speech recognition in adverse environments: a probabilistic approach , 2002 .

[162]  Ted H. Applebaum,et al.  Features for noise-robust speaker-independent word recognition , 1990, ICSLP.

[163]  Roland Kuhn,et al.  Eigenvoices: A compact representation of speakers in model space , 2000, Ann. des Télécommunications.

[164]  Neil D. Lawrence,et al.  Acoustic space dimensionality selection and combination using the maximum entropy principle , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[165]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.

[166]  C. Wellekens,et al.  Fepstrum representation of speech signal , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[167]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[168]  Hervé Bourlard,et al.  MODELLING AUXILIARY FEATURES in TANDEM SYSTEMS , 2004 .

[169]  Janet Slifka,et al.  Speaker modification with LPC pole analysis , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[170]  Katarina Bartkova Generating proper name pronunciation variants for automatic speech recognition , 2003 .

[171]  Thomas Hain,et al.  Dynamic HMM selection for continuous speech recognition , 1999, EUROSPEECH.

[172]  S.K. Gupta,et al.  High-accuracy connected digit recognition for mobile applications , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[173]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[174]  John H. L. Hansen,et al.  Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation , 2005, IEEE Transactions on Speech and Audio Processing.

[175]  Satoshi Nakamura,et al.  LVCSR Robust to Noise and Speaking Styles , 2004 .

[176]  Shigeki Sagayama,et al.  Multiple-regression hidden Markov model , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[177]  R. Kumaresan,et al.  Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications , 1999 .

[178]  Hervé Bourlard,et al.  Mel-cepstrum modulation spectrum (MCMS) features for robust ASR , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[179]  Tanja Schultz,et al.  Language independent and language adaptive large vocabulary speech recognition , 1998, ICSLP.

[180]  Andreas Stolcke,et al.  Effective acoustic modeling for rate-of-speech variation in large vocabulary conversational speech recognition , 2004, INTERSPEECH.

[181]  Dan Jurafsky,et al.  Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. , 2003, The Journal of the Acoustical Society of America.

[182]  James Emil Flege,et al.  Interaction between the native and second language phonetic subsystems , 2003, Speech Commun..

[183]  Tatsuya Kawahara,et al.  Speaking-rate dependent decoding and adaptation for spontaneous lecture speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[184]  Nam Soo Kim,et al.  Rapid online adaptation using speaker space model evolution , 2004, Speech Commun..

[185]  Seiichi Nakagawa,et al.  Speaker independent speech recognition using features based on glottal sound source , 2002, INTERSPEECH.

[186]  Pietro Laface,et al.  Connected digit recognition using short and long duration models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[187]  John H. L. Hansen,et al.  Language accent classification in American English , 1996, Speech Commun..