Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects
暂无分享,去创建一个
[1] Nebojsa Jojic,et al. A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..
[2] Chuohao Yeo,et al. Visual speaker localization aided by acoustic models , 2009, MM '09.
[3] Irena Koprinska,et al. Temporal video segmentation: A survey , 2001, Signal Process. Image Commun..
[4] Hiroaki Kitano,et al. Real-time speaker localization and speech separation by audio-visual integration , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[5] Trevor Darrell,et al. Ausio-visual Segmentation and "The Cocktail Party Effect" , 2000, ICMI.
[6] Tae-Kyun Kim,et al. Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[7] Jitendra Malik,et al. Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] H. Hotelling. Relations Between Two Sets of Variates , 1936 .
[9] Ramani Duraiswami,et al. Microphone Arrays as Generalized Cameras for Integrated Audio Visual Processing , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[10] Shih-Fu Chang,et al. Short-term audio-visual atoms for generic video concept classification , 2009, ACM Multimedia.
[11] Meng Wang,et al. Dynamic captioning: video accessibility enhancement for hearing impairment , 2010, ACM Multimedia.
[12] Larry S. Davis,et al. Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[13] Yoichi Sato,et al. Visual localization of non-stationary sound sources , 2009, ACM Multimedia.
[14] Trevor Darrell,et al. Speaker association with signal-level audiovisual fusion , 2004, IEEE Transactions on Multimedia.
[15] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[16] Declan Murphy,et al. Conducting Audio Files via Computer Vision , 2003, Gesture Workshop.
[17] Michael Elad,et al. Cross-Modal Localization via Sparsity , 2007, IEEE Transactions on Signal Processing.
[18] Yoav Y. Schechner,et al. Onsets Coincidence for Cross-Modal Analysis , 2010, IEEE Transactions on Multimedia.
[19] Stefano Soatto,et al. Quick Shift and Kernel Methods for Mode Seeking , 2008, ECCV.
[20] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..
[21] Malcolm Slaney,et al. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.
[22] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[23] Josef Kittler,et al. Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[24] A. Murat Tekalp,et al. Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.
[25] Patrick Pérez,et al. Data fusion for visual tracking with particles , 2004, Proceedings of the IEEE.
[26] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.
[27] Pierre Vandergheynst,et al. Audiovisual Gestalts , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).
[28] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[29] Ishwar K. Sethi,et al. Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.
[30] Shaogang Gong,et al. Multi-camera activity correlation analysis , 2009, CVPR.
[31] Sudeep Sarkar,et al. Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization , 2008, IEEE Transactions on Circuits and Systems for Video Technology.
[32] Harriet J. Nock,et al. Assessing face and speech consistency for monologue detection in video , 2002, MULTIMEDIA '02.