Wavelet analysis for robust speech processing and applications : applications of discrete wavelet transform and wavelet denoising to speech classification, speech enhancement and robust speech recognition

In this work, we study the application of wavelet analysis for robust speech processing. Reliable time-scale features (TS) which characterize the relevant phonetic classes such as voiced (V), unvoiced (UV), silence (S), mixed-excitation, and stop sounds are extracted. By training neural and Bayesian networks, the classification rates provided by only 7 TS features are mostly similar to the ones obtained by 13 MFCC features. The TS features are further enhanced to design a reliable and low-complexity V/UV/S classifier. Quantile filtering and slope tracking are used for deriving adaptive thresholds. A robust voice activity detector is then built and used as a pre-processing stage to improve the performance of a speaker verification system. Based on wavelet shrinkage, a statistical wavelet filtering (SWF) method is designed for speech enhancement. Non-stationary and colored noise is handled by employing quantile filtering and time-frequency adaptive weighting. A newly proposed comparison diagnostic test and other subjective tests show improvements compared with other denoising methods. The SWF is further optimized to enhance speech quality for robust ASR. By changing the shape of the frequency weighting and estimating perceptual noise thresholds in critical subbands, the perceptual SWF method provides almost equal performance compared with the ETSI baseline for car noise and significant improvements compared with other methods in aircraft maintenance factory conditions.

[1]  Anshu Agarwal,et al.  TWO-STAGE MEL-WARPED WIENER FILTER FOR ROBUST SPEECH RECOGNITION , 1999 .

[2]  Stephen A. Zahorian,et al.  Phone classification with segmental features and a binary-pair partitioned neural network classifier , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[5]  Climent Nadeu,et al.  Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition , 1997, IEEE Trans. Speech Audio Process..

[6]  Li Deng,et al.  Phonetic classification and recognition using HMM representation of overlapping articulatory features for all classes of English sounds , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  I. Johnstone WAVELET SHRINKAGE FOR CORRELATED DATA AND INVERSE PROBLEMS: ADAPTIVITY RESULTS , 1999 .

[8]  B. Vidakovic Nonlinear wavelet shrinkage with Bayes rules and Bayes factors , 1998 .

[9]  Sang-Sik Ahn,et al.  Statistical Model-Based VAD Algorithm with Wavelet Transform , 2006, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[10]  Hamid Sheikhzadeh,et al.  An improved wavelet-based speech enhancement system , 2001, INTERSPEECH.

[11]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[12]  Gernot Kubin,et al.  SpeechDat-AT: A telephone speech database for Austrian German , 2000 .

[13]  H. Lane,et al.  The Lombard Sign and the Role of Hearing in Speech , 1971 .

[14]  Beng T. Tan,et al.  Applying wavelet analysis to speech segmentation and classification , 1994, Defense, Security, and Sensing.

[15]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[16]  W. Bastiaan Kleijn,et al.  Encoding speech using prototype waveforms , 1993, IEEE Trans. Speech Audio Process..

[17]  Sarel van Vuuren,et al.  Relevancy of time-frequency features for phonetic classification measured by mutual information , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[18]  M. Gabrea,et al.  Wavelet based speech enhancement using two different threshold-based denoising algorithms , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[19]  T. Lotter NOISE REDUCTION BY MAXIMUM A POSTERIORI SPECTRAL AMPLITUDE ESTIMATION WITH SUPERGAUSSIAN SPEECH MODELING , 2003 .

[20]  Benoît Champagne,et al.  Incorporating the human hearing properties in the signal subspace approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[21]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[22]  Eugene Coyle,et al.  Speech-adaptive time-scale modification for computer assisted language-learning , 2003, Proceedings 3rd IEEE International Conference on Advanced Technologies.

[23]  Tuan Van Pham,et al.  DWT-based classification of acoustic-phonetic classes and phonetic units , 2004, INTERSPEECH.

[24]  Alan V. Oppenheim,et al.  Enhancement of speech by adaptive filtering , 1976, ICASSP.

[25]  Wonyong Sung,et al.  A voice activity detector employing soft decision based noise spectrum adaptation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[26]  Jhing-Fa Wang,et al.  Speech Enhancement Using Perceptual Wavelet Packet Decomposition and Teager Energy Operator , 2004, J. VLSI Signal Process..

[27]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[28]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[29]  S. Godsill,et al.  Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement , 2001, Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal Processing (Cat. No.01TH8563).

[30]  S. R. Mahadeva Prasanna,et al.  Speech enhancement using excitation source information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  J. Rouat,et al.  Wavelet speech enhancement based on the Teager energy operator , 2001, IEEE Signal Processing Letters.

[32]  Jelena Kovacevic,et al.  Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[33]  Nikos Fakotakis,et al.  Overlapping wavelet packet features for speaker verification , 2005, INTERSPEECH.

[34]  Yuan Baozong,et al.  The consonant/vowel (C/V) speech classification using high-rank function neural network (HRFNN) , 1996, Proceedings of Third International Conference on Signal Processing (ICSP'96).

[35]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[36]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[37]  R. Haddad,et al.  Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets , 1992 .

[38]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[39]  Jerry D. Gibson,et al.  Filtering of colored noise for speech enhancement and coding , 1991, IEEE Trans. Signal Process..

[40]  Bobby R. Hunt,et al.  Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier , 1993, IEEE Trans. Speech Audio Process..

[41]  Franz Pernkopf,et al.  Order-based Discriminative Structure Learning for Bayesian Network Classifiers , 2008, ISAIM.

[42]  Tuan Van Pham,et al.  DWT-based phonetic groups classification using neural networks , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[43]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .

[44]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Renato De Mori,et al.  A modified Ephraim-Malah noise suppression rule for automatic speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  Sriram Srinivasan,et al.  Knowledge-Based Speech Enhancement , 2005 .

[47]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[48]  Md. Kamrul Hasan,et al.  Efficient hard and soft thresholding for wavelet speech enhancement , 2002, 2002 11th European Signal Processing Conference.

[49]  Ilya A. Lavrik,et al.  Novel Wavelet-Based Statistical Methods with Applications in Classification,Shrinkage, and Nano-Scale Image Analysis , 2005 .

[50]  T. V. Pham,et al.  Low-complexity and efficient classification of voiced/unvoiced/silence for noisy environments , 2006, INTERSPEECH.

[51]  Steve Young A review of large-vocabulary continuous-speech , 1996 .

[52]  Bing-Fei Wu,et al.  Voice Activity Detection Based on Auto-Correlation Function Using Wavelet Transform and Teager Energy Operator , 2006, ROCLING/IJCLCLP.

[53]  Ferial El-Hawary,et al.  The Electrical Engineering Handbook Series , 2004 .

[54]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[55]  Joan Marí Hilario Discriminative connectionist approaches for automatic speech recognition in cars , 2004 .

[56]  Andreas Spanias,et al.  Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..

[57]  H. Teager Some observations on oral air flow during phonation , 1980 .

[58]  Tuan Van Pham,et al.  Time-Frequency Analysis for Voice Activity Detection , 2006, SPPRA.

[59]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[60]  Javier Ramírez,et al.  An effective subband OSF-based VAD with noise reduction for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[61]  James R. Glass,et al.  A comparative study of signal representations and classification techniques for speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Henning Sanneck,et al.  Concealment of lost speech packets using adaptive packetization , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[63]  B. Silverman,et al.  Wavelet thresholding via a Bayesian approach , 1998 .

[64]  Zdravko Kacic,et al.  Noise robust speech parameterization based on joint wavelet packet decomposition and autoregressive modeling , 2003, INTERSPEECH.

[65]  Tuan Van Pham,et al.  Robust speaker verification in air traffic control using improved voice activity detection , 2007 .

[66]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[67]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[68]  Tuan Van Pham,et al.  Robust voice activity detection for narrow-bandwidth speaker verification under adverse environments , 2007, INTERSPEECH.

[69]  A. Bruce,et al.  WAVESHRINK WITH FIRM SHRINKAGE , 1997 .

[70]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[71]  L. Liao,et al.  Algorithms for speech classification , 1999, ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359).

[72]  Johnson I. Agbinya,et al.  Discrete wavelet transform techniques in speech processing , 1996, Proceedings of Digital Processing Applications (TENCON '96).

[73]  Ziyou Xiong,et al.  Boosting Speech/Non-speech Classification Using Averaged Mel-Frequency Cepstrum Coefficients Features , 2002, IEEE Pacific Rim Conference on Multimedia.

[74]  Jean-Claude Junqua,et al.  The Lombard effect: a reflex to better communicate with others in noise , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[75]  Omar Farooq,et al.  Mel filter-like admissible wavelet packet structure for speech recognition , 2001, IEEE Signal Processing Letters.

[76]  Yariv Ephraim,et al.  A Bayesian estimation approach for speech enhancement using hidden Markov models , 1992, IEEE Trans. Signal Process..

[77]  Donald G. Childers,et al.  Speech processing and synthesis toolboxes , 1999 .

[78]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[79]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[80]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[81]  Donald G. Childers,et al.  Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[82]  Paul T. Brady,et al.  A statistical analysis of on-off patterns in 16 conversations , 1968 .

[83]  S. Mallat A wavelet tour of signal processing , 1998 .

[84]  Keun-Sung Bae,et al.  Speech enhancement with reduction of noise components in the wavelet domain , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[85]  Sharon Gannot,et al.  Speech enhancement using a mixture-maximum model , 1999, IEEE Trans. Speech Audio Process..

[86]  Tuan Van Pham,et al.  Bayesian networks for phonetic classification using time-scale features , 2006, INTERSPEECH.

[87]  Xiao-Ping Zhang,et al.  Nonlinear adaptive noise suppression based on wavelet transform , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[88]  Gérard Faucon,et al.  Proposal of a voice activity detector for noise reduction , 1994 .

[89]  Y. Ephraim Statistical model-based speech enhancement systems , 1988 .

[90]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[91]  W. Bastiaan Kleijn,et al.  Codebook driven short-term predictor parameter estimation for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[92]  Alvin F. Martin,et al.  NIST speaker recognition evaluations 1996-2008 , 2009, Defense + Commercial Sensing.

[93]  Omar Farooq,et al.  Wavelet-based denoising for robust feature extraction for speech recognition , 2003 .

[94]  Sung-Il Yang,et al.  Speech enhancement using adaptive wavelet shrinkage , 2001, ISIE 2001. 2001 IEEE International Symposium on Industrial Electronics Proceedings (Cat. No.01TH8570).

[95]  Jhing-Fa Wang,et al.  A wavelet-based voice activity detection algorithm in noisy environments , 2002, 9th International Conference on Electronics, Circuits and Systems.

[96]  Javier Ramírez,et al.  A new voice activity detector using subband order-statistics filters for robust speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[97]  D. L. Donoho,et al.  Ideal spacial adaptation via wavelet shrinkage , 1994 .

[98]  A. El-Jaroudi,et al.  Voiced-unvoiced-silence classification of speech using neural nets , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[99]  Rolf Vetter,et al.  Single channel speech enhancement using MDL-based subspace approach in Bark domain , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[100]  Douglas D. O'Shaughnessy,et al.  Speech enhancement based conceptually on auditory evidence , 1991, IEEE Trans. Signal Process..

[101]  Eliathamby Ambikairajah,et al.  Perceptual wavelet packet audio coder , 2004, INTERSPEECH.

[102]  Andrzej Drygajlo,et al.  Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms , 1999, IEEE Trans. Signal Process..

[103]  Qiang Fu,et al.  Perceptual wavelet adaptive denoising of speech , 2003, INTERSPEECH.

[104]  David Malah,et al.  Speech enhancement using optimal non-linear spectral amplitude estimation , 1983, ICASSP.

[105]  Hsiao-Chuan Wang,et al.  Enhancement of single channel speech based on masking property and wavelet transform , 2003, Speech Commun..

[106]  Edward H. Adelson,et al.  Noise removal via Bayesian wavelet coring , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[107]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[108]  Rainer Martin,et al.  An efficient algorithm to estimate the instantaneous SNR of speech signals , 1993, EUROSPEECH.

[109]  Anna C. Gilbert,et al.  Robust speech recognition using wavelet coefficient features , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[110]  Tuan Van Pham,et al.  Comparison of models using time-frequency features for speech classification , 2006, 2006 International Conference onResearch, Innovation and Vision for the Future.

[111]  James R. Glass,et al.  A wavelet and filter bank framework for phonetic classification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[112]  Xiao-Ping Zhang,et al.  A new time-scale adaptive denoising method based on wavelet shrinkage , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[113]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[114]  P. P. Vaidyanathan,et al.  Wavelet-based denoising by customized thresholding , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[115]  Luís B. Almeida,et al.  Variable-frequency synthesis: An improved harmonic coding scheme , 1984, ICASSP.

[116]  Jan P. H. van Santen,et al.  Review of Handbook of standards and resources for spoken language systems by Dafydd Gibbon, Roger Moore, and Richard Winski. Mouton de Gruyter 1997. , 1998 .

[117]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[118]  H. Chipman,et al.  Adaptive Bayesian Wavelet Shrinkage , 1997 .

[119]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[120]  Daniel Graupe,et al.  A wavelet transform approach to blind adaptive filtering of speech from unknown noises , 2003 .

[121]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[122]  A. Poularikas The transforms and applications handbook , 2000 .

[123]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[124]  Abdulhussain E. Mahdi,et al.  Wavelet-based perceptual speech enhancement using adaptive threshold estimation , 2003, INTERSPEECH.

[125]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[126]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[127]  Alexander Fischer,et al.  Quantile based noise estimation for spectral subtraction and Wiener filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[128]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[129]  Bishnu S. Atal,et al.  A new model of LPC excitation for producing natural-sounding speech at low bit rates , 1982, ICASSP.

[130]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[131]  Harris Drucker Speech processing in a high ambient noise environment , 1967 .

[132]  Rathinavelu Chengalvarayan,et al.  Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition , 1999, EUROSPEECH.

[133]  Régine Le Bouquin-Jeannès,et al.  Study of a voice activity detector and its influence on a noise reduction system , 1995, Speech Commun..

[134]  P Niyogi,et al.  Detecting stop consonants in continuous speech. , 2002, The Journal of the Acoustical Society of America.

[135]  Zekeriya Tufekci,et al.  Mel-scaled discrete wavelet coefficients for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[136]  Yoshiaki Ohshima,et al.  Environmental robustness in speech recognition using physiologically-motivated signal processing , 1993 .

[137]  Tuan Van Pham,et al.  WPD-based noise suppression using nonlinearly weighted threshold quantile estimation and optimal wavelet shrinking , 2005, INTERSPEECH.

[138]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[139]  Nicholas W. D. Evans,et al.  Noise estimation without explicit speech, non-speech detection: a comparison of mean, modal and median based approaches , 2001, INTERSPEECH.

[140]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[141]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[142]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[143]  G. W. Hughes,et al.  Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .

[144]  B. Vidakovic,et al.  Bayesian Inference in Wavelet-Based Models , 1999 .

[145]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[146]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[147]  Lou Boves,et al.  Noise reduction for noise robust feature extraction for distributed speech recognition , 2001, INTERSPEECH.

[148]  Maarten Jansen,et al.  Noise Reduction by Wavelet Thresholding , 2001 .

[149]  Stéphane Mallat,et al.  Singularity detection and processing with wavelets , 1992, IEEE Trans. Inf. Theory.

[150]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[151]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[152]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[153]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[154]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[155]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[156]  Takao Kobayashi,et al.  Voiced/unvoiced determination of speech signal in noisy environment using harmonicity measure based on instantaneous frequency , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[157]  A. Bruce,et al.  Understanding WaveShrink: Variance and bias estimation , 1996 .

[158]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[159]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[160]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[161]  Rainer Martin,et al.  Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[162]  C. Valens,et al.  A Really Friendly Guide to Wavelets , 1999 .

[163]  Tuan Van Pham,et al.  Noise Suppression Based Onwavelet Packet Decomposition and Quantile Noise Estimation for Robust Automatic Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[164]  Rainer Martin,et al.  MMSE estimation of magnitude-squared DFT coefficients with superGaussian priors , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[165]  Noureddine Ellouze,et al.  Speech classification in noisy environment using subband decomposition , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[166]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[167]  Kumar Swaminathan,et al.  Noise reduction and echo cancellation front-end for speech codecs , 2003, IEEE Trans. Speech Audio Process..

[168]  Eliathamby Ambikairajah,et al.  Speech enhancement for nonstationary noise environment , 2002, Asia-Pacific Conference on Circuits and Systems.

[169]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[170]  Zdravko Kacic,et al.  The usage of wavelet packet transformation in automatic noisy speech recognition systems , 2003, The IEEE Region 8 EUROCON 2003. Computer as a Tool..

[171]  Luis Weruaga,et al.  Noise Cancellation Frontends for Automatic Meeting Transcription , 2006 .

[172]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[173]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .

[174]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[175]  Lei Zhang,et al.  A CELP variable rate speech codec with low average rate , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[176]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[177]  Jr. L.R. Litwin Speech coding with wavelets , 1998 .

[178]  John Mourjopoulos,et al.  Speech enhancement based on audible noise suppression , 1997, IEEE Trans. Speech Audio Process..

[179]  S. Gökhun Tanyer,et al.  Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[180]  Andrew G. Bruce,et al.  WaveShrink: shrinkage functions and thresholds , 1995, Optics + Photonics.

[181]  Alexander Poularikas,et al.  Noise Reduction in Speech Applications Noise Reduction in Speech Applications Noise Reduction in Speech Applications , 2002 .

[182]  Gernot Kubin,et al.  Performance of noise excitation for unvoiced speech , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[183]  B. Kedem,et al.  Spectral analysis and discrimination by zero-crossings , 1986, Proceedings of the IEEE.

[184]  Carl Taswell,et al.  The what, how, and why of wavelet shrinkage denoising , 2000, Comput. Sci. Eng..

[185]  H.S. Jamadagni,et al.  VAD techniques for real-time speech transmission on the Internet , 2002, 5th IEEE International Conference on High Speed Networks and Multimedia Communication (Cat. No.02EX612).

[186]  Guillaume Gravier,et al.  Overview of the 2000-2001 ELISA Consortium research activities , 2001, Odyssey.

[187]  John H. L. Hansen,et al.  Frequency band analysis for stress detection using a teager energy operator based feature , 2002, INTERSPEECH.

[188]  S. R. Mahadeva Prasanna,et al.  Extraction of pitch in adverse conditions , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[189]  G. Schroder,et al.  Robust voice-activity detection based on the wavelet transform , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[190]  Roberto Gemello,et al.  Multi-source neural networks for speech recognition , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[191]  Martin Vetterli,et al.  Wavelets and filter banks: theory and design , 1992, IEEE Trans. Signal Process..

[192]  Willem Bastiaan Kleijn,et al.  Time-scale modification of speech based on a nonlinear oscillator model , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.