Statistical multimodal integration for audio-visual speech processing
暂无分享,去创建一个
[1] Satoshi Nakamura,et al. Speech-to-face movement synthesis based on HMMS , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[2] Chalapathy Neti,et al. Translingual visual speech synthesis , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[3] Tsuhan Chen,et al. Real-time lip-synch face animation driven by human voice , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).
[4] Javier Hernando,et al. Maximum likelihood weighting of dynamic speech features for CDHMM speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[5] Nadia Magnenat-Thalmann,et al. Lip synchronization using linear predictive analysis , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[6] Tsuhan Chen,et al. Tracking of multiple faces for human-computer interfaces and virtual environments , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[7] Alexandrina Rogozan,et al. Asynchronous integration of audio and visual sources in bi-modal automatic speech recognition , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).
[8] Eric David Petajan,et al. Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .
[9] Jean-Luc Schwartz,et al. A comparison of models for fusion of the auditory and visual sensors in speech perception , 1995, Artificial Intelligence Review.
[10] Takao Kobayashi,et al. Text-to-audio-visual speech synthesis based on parameter generation from HMM , 1999, EUROSPEECH.
[11] Gerasimos Potamianos,et al. Speaker adaptation for audio-visual speech recognition , 1999, EUROSPEECH.
[12] Stephen E. Levinson,et al. Speaker independent audio-visual speech recognition , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[13] Keiichi Tokuda,et al. Audio-visual speech recognition using MCE-based hmms and model-dependent stream weights , 2000, INTERSPEECH.
[14] Thomas S. Huang,et al. A new approach to integrate audio and visual features of speech , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[15] Stephen M. Omohundro,et al. Nonlinear manifold learning for visual speech recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.
[16] Peter L. Silsbee,et al. Audiovisual Sensory Integration Using Hidden Markov Models , 1996 .
[17] Ken W Grant,et al. Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory–Visual Speech, edited by Ruth Campbell, Barbara Dodd, and Denis Burnham , 1999, Trends in Cognitive Sciences.
[18] Ali Adjoudani,et al. Audio-visual speech recognition compared across two architectures , 1995, EUROSPEECH.
[19] Gregory J. Wolff,et al. Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[20] Gerasimos Potamianos,et al. Discriminative training of HMM stream exponents for audio-visual speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[21] Satoshi Nakamura,et al. Subjective Evaluation for HMM-Based Speech-To-Lip Movement Synthesis , 1998, AVSP.
[22] Stephen J. Cox,et al. Combining noise compensation with visual information in speech recognition , 1997, AVSP.
[23] Alexander H. Waibel,et al. Toward movement-invariant automatic lip-reading and speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[24] P. L. Silsbee. Sensory integration in audiovisual automatic speech recognition , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.
[25] B.P. Yuhas,et al. Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.
[26] Thomas S. Huang,et al. Real time speech driven facial animation using formant analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..
[27] Satoshi Nakamura,et al. Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual synchronous database , 1997, EUROSPEECH.
[28] Yoshihiko Nankaku,et al. Intensity- and location-normalized training for HMM-based visual speech recognition , 1999, EUROSPEECH.
[29] Jing Xiao,et al. Automatic selection of visemes for image-based visual speech synthesis , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[30] Keith Waters,et al. Driving synthetic mouth gestures: phonetic recognition for faceme! , 1997, EUROSPEECH.
[31] N. Michael Brooke,et al. Two- and Three-Dimensional Audio-Visual Speech Synthesis , 1998, AVSP.
[32] F. Lavagetto,et al. Converting speech into lip movements: a multimedia telephone for hard of hearing people , 1995 .
[33] Peter L. Silsbee,et al. Robust audiovisual integration using semicontinuous hidden Markov models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[34] Kiyoharu Aizawa,et al. An intelligent facial image coding driven by speech and phoneme , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[35] Satoshi Nakamura,et al. Speech-to-Lip Movement Synthesis by Maximizing Audio-Visual Joint Probability Based on the EM Algorithm , 2001, J. VLSI Signal Process..
[36] David G. Stork. Sources of Neural Structure in Speech and Language Processing , 1991, Int. J. Neural Syst..
[37] Martin Heckmann,et al. Comparing audio- and a-posteriori-probability-based stream confidence measures for audio-visual speech recognition , 2001, INTERSPEECH.
[38] David G. Stork,et al. Speechreading by Humans and Machines , 1996 .
[39] Martin Heckmann,et al. Labeling audio-visual speech corpora and training an ANN/HMM audio-visual speech recognition system , 2000, INTERSPEECH.
[40] Biing-Hwang Juang,et al. Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method , 1998, Proc. IEEE.
[41] David G. Stork,et al. Visionary Speech: Looking Ahead to Practical Speechreading Systems , 1996 .
[42] Giridharan Iyengar,et al. A cascade image transform for speaker independent automatic speechreading , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[43] Satoshi Nakamura,et al. Speech-to-lip movement synthesis maximizing audio-visual joint probability based on EM algorithm , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).
[44] Satoshi Nakamura,et al. Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.
[45] Takaaki Kuratate,et al. Audio-visual synthesis of talking faces from speech production correlates. , 1999 .
[46] Régine André-Obrecht,et al. Audio visual speech recognition and segmental master slave HMM , 1997, AVSP.
[47] Benoît Maison,et al. Perceptual interfaces for information interaction: joint processing of audio and visual information for human-computer interaction , 2000, INTERSPEECH.
[48] Satoshi Nakamura,et al. An adaptive integration based on product hmm for audio-visual speech recognition , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..
[49] E. Vatikiotis-Bateson,et al. Eye movement of perceivers during audiovisualspeech perception , 1998, Perception & psychophysics.
[50] Satoshi Nakamura,et al. Model-based lip synchronization with automatically translated synthetic voice toward a multi-modal translation system , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..
[51] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[52] Juergen Luettin,et al. Asynchronous stream modeling for large vocabulary audio-visual speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[53] Homer H. Chen,et al. Speech recognition for image animation and coding , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[54] Alexander H. Waibel,et al. Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[55] Tsuhan Chen,et al. A new frame interpolation scheme for talking head sequences , 1995, Proceedings., International Conference on Image Processing.
[56] David G. Stork,et al. Using deformable templates to infer visual speech dynamics , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.
[57] S. Morishima,et al. Automatic face tracking and model match-move automatic face tracking and model match-move in video sequence using 3D face model in video sequence using 3D face model , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..
[58] Levent M. Arslan,et al. Codebook based face point trajectory synthesis algorithm using speech input , 1999, Speech Commun..
[59] Keiichi Tokuda,et al. HMM-based text-to-audio-visual speech synthesis , 2000, INTERSPEECH.
[60] Tsuhan Chen,et al. Audiovisual speech processing , 2001, IEEE Signal Process. Mag..
[61] Thomas S. Huang,et al. Bimodal speech recognition using coupled hidden Markov models , 2000, INTERSPEECH.
[62] Satoshi Nakamura,et al. State synchronous modeling of audio-visual information for bi-modal speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..
[63] Terrence J. Sejnowski,et al. Neural network models of sensory integration for improved vowel recognition , 1990, Proc. IEEE.
[64] Christian Benoît,et al. A 3-d model of the lips for visual speech synthesis , 1994, SSW.
[65] Alexandrina Rogozan,et al. Adaptive determination of audio and visual weights for automatic speech recognition , 1997, AVSP.
[66] Pierre Jourlin. Handling disynchronization phenomena with HMM in connected speech , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).
[67] R. Campbell,et al. Hearing by eye 2 : advances in the psychology of speechreading and auditory-visual speech , 1997 .
[68] Yochai Konig,et al. "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[69] Satoshi Nakamura,et al. Speech to lip movement synthesis by HMM , 1997, AVSP ...
[70] Hervé Glotin,et al. Weighting schemes for audio-visual fusion in speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[71] Keiichi Tokuda,et al. Text-to-visual speech synthesis based on parameter generation from HMM , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[72] Alan Jeffrey Goldschen,et al. Continuous automatic speech recognition by lipreading , 1993 .
[73] W. H. Sumby,et al. Visual contribution to speech intelligibility in noise , 1954 .
[74] Martin J. Russell,et al. Integrating audio and visual information to provide highly robust speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[75] Chalapathy Neti,et al. Stream confidence estimation for audio-visual speech recognition , 2000, INTERSPEECH.
[76] Tsuhan Chen,et al. Cross-modal prediction in audio-visual communication , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[77] Christian Benoît,et al. Which components of the face do humans and machines best speechread , 1996 .
[78] Allen A. Montgomery,et al. Automatic optically-based recognition of speech , 1988, Pattern Recognit. Lett..
[79] A. Adjoudani,et al. On the Integration of Auditory and Visual Parameters in an HMM-based ASR , 1996 .
[80] Milind R. Naphade,et al. Duration dependent input output markov models for audio-visual event detection , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..
[81] Gerasimos Potamianos,et al. Speaker independent audio-visual database for bimodal ASR , 1997, AVSP.
[82] Jenq-Neng Hwang,et al. Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System , 2001, J. VLSI Signal Process..
[83] Hitoshi Iida,et al. A Japanese-to-English speech translation system: ATR-MATRIX , 1998, ICSLP.
[84] Tomio Watanabe,et al. Lip-reading of Japanese vowels using neural networks , 1990, ICSLP.
[85] Hiroshi Harashima,et al. A Media Conversion from Speech to Facial Image for Intelligent Man-Machine Interface , 1991, IEEE J. Sel. Areas Commun..
[86] N. Michael Brooke. Talking Heads and Speech Recognisers That Can See: The Computer Processing of Visual Speech Signals , 1996 .
[87] Satoshi Nakamura,et al. Stream weight optimization of speech and lip image sequence for audio-visual speech recognition , 2000, INTERSPEECH.
[88] Jean-Luc Schwartz,et al. Models for audiovisual fusion in a noisy-vowel recognition task , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.
[89] Satoshi Nakamura,et al. Automatic Face Tracking And Model Match-Move In Video Sequence Using 3d Face Model , 2001, ICME.
[90] Piero Cosi,et al. Bimodal recognition experiments with recurrent neural networks , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[91] Shigeo Morishima,et al. 3D Lip Expression Generation by using New Lip Parameters , 2000 .
[92] Alan C. Bovik,et al. Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..
[93] S. Haykin,et al. Pattern Recognition Using a Family of Design Algorithms Based upon the Generalized Probabilistic Descent Method , 2001 .
[94] Kenji Kurosu,et al. Neural network vowel-recognition jointly using voice features and mouth shape image , 1991, Pattern Recognit..