Multimodal speaker localization in a probabilistic framework
暂无分享,去创建一个
[1] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[2] Jean-Philippe Thiran,et al. From error probability to information theoretic (multi-modal) signal processing , 2005, Signal Process..
[3] Pierre Vandergheynst,et al. Experimental evaluation framework for speaker detection on the CUAVE database , 2006 .
[4] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[5] Jean-Philippe Thiran,et al. A multimodal approach to extract optimized audio features for speaker detection , 2005, 2005 13th European Signal Processing Conference.
[6] Trevor Darrell,et al. Speaker association with signal-level audiovisual fusion , 2004, IEEE Transactions on Multimedia.
[7] Sabri Gurbuz,et al. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus , 2002, EURASIP J. Adv. Signal Process..
[8] Jean-Philippe Thiran,et al. Feature space mutual information in speech-video sequences , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.
[9] Malcolm Slaney,et al. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.
[10] Harriet J. Nock,et al. Assessing face and speech consistency for monologue detection in video , 2002, MULTIMEDIA '02.
[11] Juergen Luettin,et al. Audio-Visual Automatic Speech Recognition: An Overview , 2004 .
[12] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[13] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .
[14] Harriet J. Nock,et al. Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study , 2003, CIVR.