Discriminant Training of Front-End and Acoustic Modeling Stages to Heterogeneous Acoustic Environmen
暂无分享,去创建一个
[1] Steven Greenberg,et al. Temporal constraints on speech intelligibility as deduced from exceedingly sparse spectral representations , 1999, EUROSPEECH.
[2] Adam Krzyżak,et al. Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..
[3] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[4] Jont B. Allen,et al. How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..
[5] Steven Greenberg,et al. Speech intelligibility in the presence of cross-channel spectral asynchrony , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[6] Reinhold Häb-Umbach,et al. LDA derived cepstral trajectory filters in adverse environmental conditions , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[7] Nikki Mirghafori,et al. Transmissions and transitions: a study of two common assumptions in multi-band ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[8] R. Schwartz,et al. The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[9] Sadaoki Furui,et al. Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..
[10] Steven Greenberg,et al. Integrating syllable boundary information into speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[11] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .
[12] Misha Pavel,et al. Towards ASR on partially corrupted speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[13] Brian Hanson,et al. Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[14] Alexander H. Waibel,et al. Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[15] M. M. Cohen,et al. What can visual speech synthesis tell visual speech recognition? , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.
[16] Barry Y. Chen,et al. On data-derived temporal processing in speech feature extraction , 2000, INTERSPEECH.
[17] Brian Kingsbury,et al. Spert-II: A Vector Microprocessor System , 1996, Computer.
[18] Sarel van Vuuren,et al. Data based filter design for RASTA-like channel normalization in ASR , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[19] R. Plomp,et al. Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.
[20] Gethin Williams,et al. Knowing What You Don't Know: Roles for Confidence Measures in Automatic Speech Recognition , 1999 .
[21] Steven Greenberg,et al. Performance improvements through combining phone- and syllable-scale information in automatic speech recognition , 1998, ICSLP.
[22] Steven Greenberg,et al. The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[23] E. Zwicker,et al. Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .
[24] S. Howard Bartley,et al. The relation of pitch to frequency. , 1950 .
[25] Jeff A. Bilmes,et al. Joint distributional modeling with cross-correlation based features , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[26] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.
[27] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[28] Sarel van Vuuren,et al. Relevance of time-frequency features for phonetic and speaker-channel classification , 2000, Speech Commun..
[29] Sarel van Vuuren,et al. Speaker verification in a time-feature space , 1999 .
[30] A. B.,et al. SPEECH COMMUNICATION , 2001 .
[31] R Drullman,et al. Temporal envelope and fine structure cues for speech intelligibility. , 1994, The Journal of the Acoustical Society of America.
[32] Hynek Hermansky,et al. Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[33] Christos Andrea Antoniou,et al. Acoustic modelling using modular/ensemble combinations of heterogeneous neural networks , 2000, INTERSPEECH.
[34] C. B. Pedersen,et al. Temporal Factors in Speech Perception , 1982 .
[35] Hynek Hermansky,et al. Data-driven methods for extracting features from speech , 2000 .
[36] Sangita R. Sharma,et al. Multi-stream approach to robust speech recognition , 1999 .
[37] Steve R. Waterhouse,et al. Ensemble Methods for Phoneme Classification , 1996, NIPS.
[38] Nelson Morgan,et al. Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments , 1998 .
[39] Liang Zhou,et al. Chinese all syllables recognition using combination of multiple classifiers , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[40] H. Hermansky,et al. Syllable intelligibility for temporally filtered LPC cepstral trajectories. , 1999, The Journal of the Acoustical Society of America.
[41] Yochai Konig,et al. "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[42] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[43] Harvey b. Fletcher,et al. Speech and hearing in communication , 1953 .
[44] Peter E. Hart,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.
[45] Yi Lu. Integration of knowledge in a multiple classifier system , 1994, IEA/AIE '94.
[46] K. Wang,et al. Auditory analysis of spectro-temporal information in acoustic signals , 1995 .
[47] Steven Greenberg,et al. AN INTRODUCTION TO THE DIAGNOSTIC EVALUATION OF SWITCHBOARD-CORPUS AUTOMATIC SPEECH RECOGNITION SYSTEMS , 2000 .
[48] Manfred R. Schroeder,et al. Computer Speech: Recognition, Compression, Synthesis , 1999 .
[49] L. R. Rabiner,et al. Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.
[50] Steven Greenberg,et al. Speech intelligibility derived from exceedingly sparse spectral information , 1998, ICSLP.
[51] Steven Greenberg,et al. Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..
[52] Alex Waibel,et al. Bimodal sensor integration on the example of 'speechreading' , 1993, IEEE International Conference on Neural Networks.
[53] Sridha Sridharan,et al. Telephone based speaker recognition using multiple binary classifier and Gaussian mixture models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[54] Shihab A. Shamma. Auditory cortical representation of complex acoustic spectra as inferred from the ripple analysis method , 1996 .
[55] Steven Greenberg,et al. Automatic phonetic transcription of spontaneous speech (american English) , 2000, INTERSPEECH.
[56] Hervé Bourlard,et al. Parallel training of MLP probability estimators for speech recognition: a gender-based approach , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.
[57] Steven Greenberg,et al. Incorporating information from syllable-length time scales into automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[58] Homer Dudley,et al. A Synthetic Speaker , 1939, Science.
[59] Jeff A. Bilmes,et al. Directed graphical models of classifier combination: application to phone recognition , 2000, INTERSPEECH.
[60] M. L. Shire,et al. Data-driven modulation filter design under adverse acoustic conditions and using phonetic and syllabic units , 1999, EUROSPEECH.
[61] A. W. M. van den Enden,et al. Discrete Time Signal Processing , 1989 .
[62] Misha Pavel,et al. On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..
[63] Katrin Kirchhoff. Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments , 1998, ICSLP.
[64] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[65] Daniel P. W. Ellis,et al. Multi-stream speech recognition: ready for prime time? , 1999, EUROSPEECH.
[66] Brian Kingsbury,et al. Recognizing reverberant speech with RASTA-PLP , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[67] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[68] R V Shannon,et al. Speech Recognition with Primarily Temporal Cues , 1995, Science.
[69] Hervé Bourlard,et al. A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[70] Ethem Alpaydin,et al. Combining multiple representations and classifiers for pen-based handwritten digit recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.
[71] N. L. Johnson,et al. Multivariate Analysis , 1958, Nature.
[72] Jean-Claude Junqua,et al. The Lombard effect: a reflex to better communicate with others in noise , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[73] Volker Tresp,et al. Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.
[74] Katrin Kirchhoff,et al. Robust speech recognition using articulatory information , 1998 .
[75] Barry Y. Chen,et al. Data-driven RASTA filters in reverberation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[76] K. Hubener,et al. Using multi-level segmentation coefficients to improve HMM speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[77] D. D. Greenwood. Critical Bandwidth and the Frequency Coordinates of the Basilar Membrane , 1961 .
[78] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[79] Yochai Konig,et al. A hybrid approach to bimodal speech recognition , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.
[80] Herman J. M. Steeneken,et al. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..
[81] Sarel van Vuuren,et al. Relevancy of time-frequency features for phonetic classification measured by mutual information , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[82] Steven Greenberg,et al. THE SIGNIFICANCE OF THE COCHLEAR TRAVELING WAVE FOR THEORIES OF FREQUENCY ANALYSIS AND PITCH , 1997 .
[83] Daniel P. W. Ellis,et al. Using mutual information to design feature combinations , 2000, INTERSPEECH.
[84] Pieter J. E. Vermeulen,et al. Combining Information from Multiple Classifiers for Speaker Verification , 1998 .
[85] M. L. Shire. Syllable onset detection from acous-tics , 1997 .
[86] Dominic W. Massaro,et al. Auditory/visual speech in multimodal human interfaces , 1994, ICSLP.
[87] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.
[88] Steven Greenberg,et al. LINGUISTIC DISSECTION OF SWITCHBOARD-CORPUS AUTOMATIC SPEECH RECOGNITION SYSTEMS , 2000 .
[89] Steven Greenberg,et al. Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation , 1999, Speech Commun..
[90] John H. L. Hansen,et al. Lombard effect compensation for robust automatic speech recognition in noise , 1990, ICSLP.
[91] Nikki Mirghafori,et al. Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers , 1998, ICSLP.
[92] H. Hermansky,et al. The modulation spectrum in the automatic recognition of speech , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[93] J. Flanagan. Speech Analysis, Synthesis and Perception , 1971 .
[94] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.
[95] George Saon,et al. Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[96] Daniel P. W. Ellis,et al. Feature extraction using non-linear transformation for robust speech recognition on the Aurora database , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).