论文信息 - Recent Advances in Feature Extraction and Acoustic Modeling for Automatic Speech Recognition

Recent Advances in Feature Extraction and Acoustic Modeling for Automatic Speech Recognition

This paper reviews progress in feature extraction and acoustic modeling for automatic speech recognition based on contributions presented at the 1997 Institute of Electrical and Electronic Engineers International Conference on Acoustic, Speech and Signal Processing (ICASSP-97).

Renato De Mori

[1] S. Young,et al. Lattice-based discriminative training for large vocabulary speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2] Sridha Sridharan,et al. Speech separation by simulating the cocktail party effect with a neural network controlled Wiener filter , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Satoru Hayamizu,et al. Speech enhancement using CSS-based array processing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Ehud Weinstein,et al. Iterative-batch and sequential algorithms for single microphone speech enhancement , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Yoshua Bengio,et al. Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[6] Renato De Mori,et al. High-performance connected digit recognition using maximum mutual information estimation , 1994, IEEE Trans. Speech Audio Process..

[7] Anthony J. Robinson,et al. Enhancement and recognition of noisy speech within an autoregressive hidden Markov model framework using noise estimates from the noisy signal , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Roger K. Moore,et al. Modelling asynchrony in speech using elementary single-signal decomposition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Jenq-Neng Hwang,et al. Joint model and feature space optimization for robust speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Reinhold Orglmeister,et al. A contextual blind separation of delayed and convolved sources , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Hiroshi Matsumoto,et al. A frequency-weighted HMM based on minimum error classification for noisy speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Eric A. Woudenberg,et al. Efficient normalization based upon GPD [generalized probabilistic descent] , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Chin-Hui Lee,et al. On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate , 1997, IEEE Trans. Speech Audio Process..

[14] Li Deng,et al. Integrated-multilingual speech recognition using universal phonological features in a functional speech production model , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15] Biing-Hwang Juang,et al. A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[16] Fabrice Plante,et al. Segregation of concurrent speech with the reassigned spectrum , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17] Yariv Ephraim,et al. A minimum mean square error approach for speech enhancement , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[18] Régine André-Obrecht,et al. Direct identification vs. correlated models to process acoustic and articulatory informations in automatic speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[20] Satoshi Takahashi,et al. Jacobian approach to fast acoustic model adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21] Andrzej Drygajlo,et al. Integrated speech enhancement and coding in the time-frequency domain , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22] Vijay Raman,et al. Robustness issues and solutions in speech recognition based telephony services , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23] Jean-Claude Junqua,et al. Robustness improvements in continuously spelled names over the telephone , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24] Mari Ostendorf,et al. Adaptation of polynomial trajectory segment models for large vocabulary speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25] Yifan Gong,et al. Elimination of trajectory folding phenomenon: HMM, trajectory mixture HMM and mixture stochastic trajectory model , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26] Etienne Barnard,et al. Smoothness analysis for trajectory features , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27] John Makhoul,et al. Speaker adaptive training: a maximum likelihood approach to speaker normalization , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28] Hynek Hermansky,et al. Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29] Daniel S. Benincasa,et al. Co-channel speaker separation using constrained nonlinear optimization , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30] Mike Schuster. Acoustic model building based on non-uniform segments and bidirectional recurrent neural networks , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31] Anders Krogh,et al. Hidden neural networks: a framework for HMM/NN hybrids , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32] Nancy Niedzielski,et al. Enhancement of esophageal speech by injection noise rejection , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33] Leon Cohen,et al. Frequency-warping and speaker-normalization , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34] Leonardo Neumeyer,et al. Probabilistic optimum filtering for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[35] Tomio Takara,et al. Isolated word recognition using the HMM structure selected by the genetic algorithm , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36] Daniel J. Mashao,et al. Utterance dependent parametric warping for a talker-independent HMM-based recognizer , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37] Jörg Meyer,et al. Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38] Antonio M. Peinado,et al. A DFE-based algorithm for feature selection in speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[39] Timothy R. Anderson,et al. Binaural phoneme recognition using the auditory image model and cross-correlation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40] Jinhai Cai,et al. Robust pitch detection of speech signals using steerable filters , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41] Jean-Luc Gauvain,et al. Model compensation for noises in training and test data , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42] Josef G. Bauer. Enhanced control and estimation of parameters for a telephone based isolated digit recognizer , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[43] Martin J. Russell,et al. Linear dynamic segmental HMMs: variability representation and training procedure , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44] Christoph Neukirchen,et al. Advanced training methods and new network topologies for hybrid MMI-connectionist/HMM speech recognition systems , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[45] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[46] Stephen A. Zahorian,et al. Phone classification with segmental features and a binary-pair partitioned neural network classifier , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47] Mark J. F. Gales,et al. Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[48] Lou Boves,et al. Phase-corrected RASTA for automatic speech recognition over the phone , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[49] Alexandros Potamianos,et al. On combining frequency warping and spectral shaping in HMM based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50] Keikichi Hirose,et al. Robust speech recognition based on Viterbi Bayesian predictive classification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[51] Christoph Neukirchen,et al. Dictionary-based discriminative HMM parameter estimation for continuous speech recognition systems , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52] Li Deng,et al. Speaker adaptation experiments using nonstationary-state hidden Markov models: a MAP approach , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53] Pietro Laface,et al. Using word temporal structure in HMM speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[54] Qiang Hou,et al. Model adaptation based on HMM decomposition for reverberant speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[55] Y.-L. Chow. Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[56] Jayadev Billa,et al. Dual-channel auditory spectrum modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[57] Brian Kan-Wing Mak. Combining ANNs to improve phone recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[58] Arthur Nadas,et al. Optimal solution of a training problem in speech recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[59] Don X. Sun. Statistical modeling of co-articulation in continuous speech based on data driven interpolation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[60] Yonghong Yan,et al. Speech recognition using neural networks with forward-backward probability generated targets , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[61] Chin-Hui Lee,et al. Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62] Jen-Tzung Chien,et al. Improved Bayesian learning of hidden Markov models for speaker adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[63] Renato De Mori,et al. A family of parallel hidden Markov models , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[64] Tai-Hwei Hwang,et al. Feature adaptation using deviation vector for robust speech recognition in noisy environment , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[65] Kazuyo Tanaka,et al. A method of extracting time-varying acoustic features effective for speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[66] Mervyn A. Jack,et al. Weighted matching algorithms and reliability in noise cancelling by spectral subtraction , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[67] Kuldip K. Paliwal,et al. Model parameter estimation for mixture density polynomial segment models , 1998, Comput. Speech Lang..

[68] Shigeru Katagiri,et al. HMM speech recognizer based on discriminative metric design , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[69] Etienne Barnard,et al. Explicit, N-best formant features for vowel classification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[70] Steven Greenberg,et al. Integrating syllable boundary information into speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[71] Puming Zhan,et al. Speaker normalization based on frequency warping , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[72] Shigeki Sagayama,et al. Improved estimation of supervision in unsupervised speaker adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[73] Björn E. Ottersten,et al. Kalman filtering for low distortion speech enhancement in mobile communication , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[74] Saeed Vaseghi,et al. Multi-resolution phonetic/segmental features and models for HMM-based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[75] Satoshi Takahashi,et al. Discrete mixture HMM , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[76] Aaron E. Rosenberg,et al. A fast algorithm for stochastic matching with application to robust speaker verification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[77] Yunxin Zhao,et al. Co-channel speech separation for robust automatic speech recognition: stability and efficiency , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[78] Cheung-Fat Chan,et al. Quality enhancement of narrowband CELP-coded speech via wideband harmonic re-synthesis , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[79] Gary H. Whipple,et al. Model based speech pause detection , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[80] Alain Biem,et al. Cepstrum-based filter-bank design using discriminative feature extraction training at various levels , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[81] Sadaoki Furui,et al. Smoothed N-best-based speaker adaptation for speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[82] Chafic Mokbel,et al. Adapting PSN recognition models to the GSM environment by using spectral transformation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[83] Herbert Gish,et al. Speaker-adapted training on the Switchboard Corpus , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[84] Brian Kingsbury,et al. Recognizing reverberant speech with RASTA-PLP , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[85] Wolfgang Wokurek. Time-frequency analysis of the glottal opening , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[86] Roger K. Moore. Computer Speech and Language , 1986 .

[87] Kazuya Takeda,et al. A binaural speech processing method using subband-cross correlation analysis for noise robust recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[88] Javier Hernando,et al. Maximum likelihood weighting of dynamic speech features for CDHMM speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[89] Werner Kozek,et al. Time-frequency structured decorrelation of speech signals via nonseparable Gabor frames , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[90] Mark J. F. Gales,et al. A fast and flexible implementation of parallel model combination , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.