Morphologically Filtered Power-Normalized Cochleograms as Robust, Biologically Inspired Features for ASR
暂无分享,去创建一个
Francisco J. Valverde-Albacete | Carmen Peláez-Moreno | Ascensión Gallardo-Antolín | Fernando de-la-Calle-Silos | F. J. Valverde-Albacete | Carmen Peláez-Moreno | A. Gallardo-Antolín | F. de-la-Calle-Silos
[1] John H. L. Hansen,et al. A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition , 2008, Speech Commun..
[2] Serajul Haque. Utilizing auditory masking in automatic speech recognition , 2010, 2010 International Conference on Audio, Language and Image Processing.
[3] Ian C. Bruce,et al. Auditory nerve model for predicting performance limits of normal and impaired listeners , 2001 .
[4] B. Moore,et al. Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.
[5] W. A. Mvnso,et al. Loudness , Its Definition , Measurement and Calculation , 2004 .
[6] Hynek Hermansky,et al. Perceptually based linear predictive analysis of speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[7] Mark Weiser,et al. Source Code , 1987, Computer.
[8] Ron Cole,et al. The ISOLET spoken letter database , 1990 .
[9] Yi Hu,et al. Incorporating a psychoacoustical model in frequency domain speech enhancement , 2004, IEEE Signal Processing Letters.
[10] Steve Young,et al. The HTK book , 1995 .
[11] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..
[12] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[13] Birger Kollmeier,et al. Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition , 2012, INTERSPEECH.
[14] Martin Heckmann,et al. A hierarchical framework for spectro-temporal feature extraction , 2011, Speech Commun..
[15] Edward R. Dougherty,et al. Hands-on Morphological Image Processing , 2003 .
[16] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.
[17] Diego H. Milone,et al. Bioinspired sparse spectro-temporal representation of speech for robust classification , 2012, Comput. Speech Lang..
[18] Francisco J. Valverde-Albacete,et al. ASR feature extraction with morphologically-filtered power-normalized cochleograms , 2014, INTERSPEECH.
[19] Richard M. Stern,et al. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[20] E. Zwicker,et al. Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .
[21] E Zwicker,et al. Inverse frequency dependence of simultaneous tone-on-tone masking patterns at low levels. , 1982, The Journal of the Acoustical Society of America.
[22] Richard M. Stern,et al. Hearing Is Believing: Biologically Inspired Methods for Robust Automatic Speech Recognition , 2012, IEEE Signal Processing Magazine.
[23] John H. L. Hansen,et al. Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect , 1994, IEEE Trans. Speech Audio Process..
[24] Richard F. Lyon,et al. A computational model of filtering, detection, and compression in the cochlea , 1982, ICASSP.
[25] Hervé Bourlard,et al. Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions , 1997, Summer School on Neural Networks.
[26] William M. Hartmann,et al. Psychoacoustics: Facts and Models , 2001 .
[27] Tuomas Virtanen,et al. Modelling spectro-temporal dynamics in factorisation-based noise-robust automatic speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Birger Kollmeier,et al. Amplitude modulation spectrogram based features for robust speech recognition in noisy and reverberant environments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Stephanie Seneff. A joint synchrony/mean-rate model of auditory speech processing , 1990 .
[30] Roy D. Patterson,et al. Auditory images:How complex sounds are represented in the auditory system , 2000 .
[31] Richard F. Lyon. A computational model of binaural localization and separation , 1983, ICASSP.
[32] Keith Vertanen. Baseline Wsj Acoustic Models for Htk and Sphinx : Training Recipes and Recognition Experiments , 2007 .
[33] Marc René Schädler,et al. Comparing Different Flavors of Spectro-Temporal Features for ASR , 2011, INTERSPEECH.
[34] Shantanu Chakrabartty,et al. Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[35] Pascal Scalart,et al. Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[36] Rhee Man Kil,et al. Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..
[37] Alfred Mertins,et al. Contextual invariant-integration features for improved speaker-independent speech recognition , 2011, Speech Commun..
[38] Birger Kollmeier,et al. Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition , 2011, Speech Commun..
[39] Oded Ghitza,et al. Auditory nerve representation as a front-end for speech recognition in a noisy environment , 1986 .
[40] W. Jesteadt,et al. Forward masking as a function of frequency, masker level, and signal delay. , 1982, The Journal of the Acoustical Society of America.
[41] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.
[42] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .
[43] S. Seneff. A joint synchrony/mean-rate model of auditory speech processing , 1990 .
[44] J. Allen,et al. Cochlear modeling , 1985, IEEE ASSP Magazine.
[45] Brian R Glasberg,et al. Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.
[46] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .
[47] Carmen Peláez-Moreno,et al. Morphological Processing of Spectrograms for Speech Enhancement , 2011, NOLISP.
[48] K.K. Paliwal,et al. Auditory masking based acoustic front-end for robust speech recognition , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).
[49] Francisco J. Valverde-Albacete,et al. Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement , 2013, Cognitive Computation.
[50] R. Patterson,et al. Complex Sounds and Auditory Images , 1992 .
[51] Jan Van der Spiegel,et al. Robust auditory-based speech processing using the average localized synchrony detection , 2002, IEEE Trans. Speech Audio Process..
[52] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .
[53] Georg v. Békésy,et al. On the Resonance Curve and the Decay Period at Various Points on the Cochlear Partition , 1949 .
[54] Hazarathaiah Malepati,et al. Speech and Audio Processing , 2010 .
[55] G. Matheron,et al. THE BIRTH OF MATHEMATICAL MORPHOLOGY , 2002 .
[56] B. Moore,et al. A revised model of loudness perception applied to cochlear hearing loss , 2004, Hearing Research.
[57] Richard M. Schwartz,et al. Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.