Pixels that sound
暂无分享,去创建一个
[1] Nebojsa Jojic,et al. A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..
[2] Michael Elad,et al. Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.
[3] Harriet J. Nock,et al. Assessing face and speech consistency for monologue detection in video , 2002, MULTIMEDIA '02.
[4] B. Moor,et al. On the Regularization of Canonical Correlation Analysis , 2003 .
[5] A. G. Flesia,et al. Can recent innovations in harmonic analysis `explain' key findings in natural image statistics? , 2001, Network.
[6] Chalapathy Neti,et al. Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization) , 2002, Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2002.
[7] H. Knutsson,et al. Learning Canonical Correlations , 1997 .
[8] Malcolm Slaney,et al. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.
[9] James M. Olson,et al. Gated Visual Input to the Central Auditory System , 2002 .
[10] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[11] Gunnar Farnebäck. A Unified Framework for Bases, Frames, Subspace Bases, and Subspace Frames , 1999 .
[12] Rémi Gribonval,et al. Sparse representations in unions of bases , 2003, IEEE Trans. Inf. Theory.
[13] Stéphane Mallat,et al. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..
[14] Michael Elad,et al. A generalized uncertainty principle and sparse representation in pairs of bases , 2002, IEEE Trans. Inf. Theory.
[15] Horst-Michael Groß,et al. A Computational Model of Early Auditory-Visual Integration , 2003, DAGM-Symposium.
[16] Trevor Darrell,et al. Speaker association with signal-level audiovisual fusion , 2004, IEEE Transactions on Multimedia.
[17] Michael I. Jordan,et al. Kernel independent component analysis , 2003 .
[18] Hans Knutsson,et al. Learning multidimensional signal processing , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).
[19] Chun Chen,et al. Audio-visual based emotion recognition - a new approach , 2004, CVPR 2004.
[20] Robert B. Ash,et al. Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.
[21] J. Driver. Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading , 1996, Nature.
[22] Yochai Konig,et al. "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[23] Larry S. Davis,et al. Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[24] D. Feldman,et al. An Anatomical Basis for Visual Calibration of the Auditory Space Map in the Barn Owl’s Midbrain , 1997, The Journal of Neuroscience.
[25] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[26] Horst Bischof,et al. Appearance models based on kernel canonical correlation analysis , 2003, Pattern Recognit..
[27] D. Donoho. For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .
[28] Ishwar K. Sethi,et al. Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.
[29] Lior Wolf,et al. Learning over Sets using Kernel Principal Angles , 2003, J. Mach. Learn. Res..
[30] B. Moor,et al. Subspace angles and distances between ARMA models , 2000 .
[31] Michael Elad,et al. A Probabilistic Study of the Average Performance of the Basis Pursuit , 2004 .
[32] Patrick Pérez,et al. Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking , 2001, ICCV.
[33] Declan Murphy,et al. Conducting Audio Files via Computer Vision , 2003, Gesture Workshop.
[34] Paris Smaragdis,et al. AUDIO/VISUAL INDEPENDENT COMPONENTS , 2003 .