Audio-Visual Speech Modeling for Continuous Speech Recognition
暂无分享,去创建一个
[1] Alex Pentland,et al. 3D modeling and tracking of human lip motions , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).
[2] Sadaoki Furui,et al. Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..
[3] Juergen Luettin,et al. Speechreading using Probabilistic Models , 1997, Comput. Vis. Image Underst..
[4] Jont B. Allen. How do humans process and recognize speech , 1993 .
[5] H. Sakoe,et al. Two-level DP-matching--A dynamic programming-based pattern matching algorithm for connected word recognition , 1979 .
[6] Gerasimos Potamianos,et al. An image transform approach for HMM based automatic lipreading , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).
[7] Harvey b. Fletcher,et al. Speech and hearing in communication , 1953 .
[8] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[9] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[10] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[11] Juergen Luettin,et al. Visual Speech and Speaker Recognition , 1997 .
[12] Alex Pentland,et al. Automatic lipreading by optical-flow analysis , 1989 .
[13] Alan C. Bovik,et al. Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..
[14] H. Sakoe,et al. Two-level DP-matching algorithm-a dynamic programming based pattern matching algorithm for continuous speech recognition , 1979 .
[15] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..
[16] W. H. Sumby,et al. Visual contribution to speech intelligibility in noise , 1954 .
[17] L. Braida. Crossmodal Integration in the Identification of Consonant Segments , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.
[18] Martin J. Russell,et al. Integrating audio and visual information to provide highly robust speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[19] Gérard Chollet,et al. Swiss French PolyPhone and PolyVar: telephone speech databases to model inter- and intra-speaker variability , 1996 .
[20] Q. Summerfield,et al. Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.
[21] L. Braida,et al. Evaluating the articulation index for auditory-visual input. , 1987, The Journal of the Acoustical Society of America.
[22] Stephen M. Omohundro,et al. Nonlinear manifold learning for visual speech recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.
[23] David Taylor. Hearing by Eye: The Psychology of Lip-Reading , 1988 .
[24] Mervyn A. Jack,et al. Weighted Viterbi algorithm and state duration modelling for speech recognition in noise , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[25] Gerasimos Potamianos,et al. Discriminative training of HMM stream exponents for audio-visual speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[26] Steven Greenberg,et al. Speech intelligibility in the presence of cross-channel spectral asynchrony , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[27] Gérard Chollet,et al. Toward Markov random field modeling of speech , 1998, ICSLP.
[28] Vincent J. van Heuven,et al. Intelligibility of audio-visually desynchronised speech: asymmetrical effect of phoneme position , 1992, ICSLP.
[29] Hervé Bourlard,et al. Subband-based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[30] Javier R. Movellan,et al. Dynamic Features for Visual Speechreading: A Systematic Comparison , 1996, NIPS.
[31] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[32] Dominic W. Massaro,et al. Perceiving asynchronous bimodal speech in consonant-vowel and vowel syllables , 1993, Speech Commun..
[33] Terrence J. Sejnowski,et al. Neural network models of sensory integration for improved vowel recognition , 1990, Proc. IEEE.
[34] Timothy F. Cootes,et al. Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..
[35] Lorenzo Torresani,et al. 2D Deformable Models for Visual Speech Analysis , 1996 .
[36] Louis D. Braida,et al. Evaluating the articulation index for auditory-visual input. , 1987, The Journal of the Acoustical Society of America.
[37] N. P. Erber,et al. Voice/mouth synthesis and tactual/visual perception of /pa, ba, ma/. , 1978, The Journal of the Acoustical Society of America.
[38] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[39] Yifan Gong,et al. Speech recognition in noisy environments: A survey , 1995, Speech Commun..
[40] Roger K. Moore,et al. Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[41] Eric David Petajan,et al. Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .
[42] Michael I. Jordan,et al. Hidden Markov Decision Trees , 1996, NIPS.
[43] J. L. Miller,et al. On the role of visual rate information in phonetic perception , 1985, Perception & psychophysics.
[44] Luc Vandendorpe,et al. The M2VTS Multimodal Face Database (Release 1.00) , 1997, AVBPA.
[45] Les E. Atlas,et al. The challenge of spoken language systems: research directions for the nineties , 1995, IEEE Trans. Speech Audio Process..
[46] Timothy F. Cootes,et al. Automatic Interpretation and Coding of Face Images Using Flexible Models , 1997, IEEE Trans. Pattern Anal. Mach. Intell..
[47] Christophe Ris,et al. Assessing local noise level estimation methods: Application to noise robust ASR , 2000, Speech Commun..
[48] Richard Lippmann,et al. Speech recognition by machines and humans , 1997, Speech Commun..