Detecting audio-visual synchrony using deep neural networks
暂无分享,去创建一个
Vaibhava Goel | Gerasimos Potamianos | Etienne Marcheret | Josef Vopicka | Vaibhava Goel | G. Potamianos | E. Marcheret | J. Vopicka
[1] B.P. Yuhas,et al. Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.
[2] Alexander H. Waibel,et al. See Me, Hear Me: Integrating Automatic Speech Recognition and Lip-reading , 1994 .
[3] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[4] Jon Barker,et al. Evidence of correlation between acoustic and visual features of speech , 1999 .
[5] Larry S. Davis,et al. Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[6] Malcolm Slaney,et al. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.
[7] Jean-Philippe Thiran,et al. Feature space mutual information in speech-video sequences , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.
[8] Harriet J. Nock,et al. Assessing face and speech consistency for monologue detection in video , 2002, MULTIMEDIA '02.
[9] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[10] Harriet J. Nock,et al. Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study , 2003, CIVR.
[11] Trevor Darrell,et al. Speaker association with signal-level audiovisual fusion , 2004, IEEE Transactions on Multimedia.
[12] Michael Wagner,et al. Liveness Verification in Audio-Video Speaker Authentication , 2004 .
[13] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.
[14] Laurent Besacier,et al. A speaker independent "liveness" test for audio-visual biometrics , 2005, INTERSPEECH.
[15] Jean-Philippe Thiran,et al. Multimodal speaker localization in a probabilistic framework , 2006, 2006 14th European Signal Processing Conference.
[16] M. E. Sargin,et al. Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.
[17] Gérard Chollet,et al. Audiovisual Speech Synchrony Measure: Application to Biometrics , 2007, EURASIP J. Adv. Signal Process..
[18] Enrique Argones-Rúa,et al. Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models , 2009, Pattern Analysis and Applications.
[19] Gerasimos Potamianos,et al. Robust audio-visual speech synchrony detection by generalized bimodal linear prediction , 2009, INTERSPEECH.
[20] Fabien Ringeval,et al. Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation , 2009, COST 2101/2102 Conference.
[21] Oscar Déniz-Suárez,et al. A comparison of face and facial feature detectors based on the Viola–Jones general object detection framework , 2011, Machine Vision and Applications.
[22] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[23] Gerasimos Potamianos,et al. Multibiometrics for Human Identification: Audiovisual Speech Synchrony Detection by a Family of Bimodal Linear Prediction Models , 2011 .
[24] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[25] S. Mallat,et al. Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26] Jing Huang,et al. Audio-visual deep learning for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[27] Stéphane Mallat,et al. Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[28] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[29] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).