The Sound of Motions
暂无分享,去创建一个
Chuang Gan | Antonio Torralba | Hang Zhao | Wei-Chiu Ma | A. Torralba | Chuang Gan | Wei-Chiu Ma | Hang Zhao
[1] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[2] Mubarak Shah,et al. Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects , 2013, IEEE Transactions on Multimedia.
[3] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[4] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[5] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[6] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[7] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] John F. Canny,et al. A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[10] Yoav Y. Schechner,et al. Harmony in Motion , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[11] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[12] Daniel P. W. Ellis,et al. MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.
[13] P. Smaragdis,et al. Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).
[14] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Hang Zhou,et al. Talking Face Generation by Adversarially Disentangled Audio-Visual Representation , 2018, AAAI.
[18] Andrzej Cichocki,et al. Nonnegative Matrix and Tensor Factorization T , 2007 .
[19] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[22] Chen Fang,et al. Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[23] Shmuel Peleg,et al. Seeing Through Noise: Visually Driven Speaker Separation And Enhancement , 2017, ICASSP.
[24] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[25] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[26] Shmuel Peleg,et al. Seeing Through Noise: Speaker Separation and Enhancement using Visually-derived Speech , 2017, ArXiv.
[27] Josh H. McDermott. The cocktail party problem , 2009, Current Biology.
[28] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[29] Kristen Grauman,et al. 2.5D Visual Sound , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Leonidas J. Guibas,et al. Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.
[33] Gaurav Sharma,et al. Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications , 2016, IEEE Transactions on Multimedia.
[34] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[35] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[36] Dahua Lin,et al. Trajectory Convolution for Action Recognition , 2018, NeurIPS.
[37] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.
[38] Chuang Gan,et al. Self-supervised Audio-visual Co-segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[40] Nuno Vasconcelos,et al. Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.
[41] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.
[42] Simon Haykin,et al. The Cocktail Party Problem , 2005, Neural Computation.
[43] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[44] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[46] Chuang Gan,et al. End-to-End Learning of Motion Representation for Video Understanding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Xiao Liu,et al. Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[48] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[49] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[50] B. Stein,et al. The Merging of the Senses , 1993 .
[51] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Emilia Gómez,et al. Monoaural Audio Source Separation Using Deep Convolutional Neural Networks , 2017, LVA/ICA.
[53] Mark D. Plumbley,et al. Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network , 2015, LVA/ICA.
[54] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.
[55] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[56] Lorenzo Torresani,et al. Co-Training of Audio and Video Representations from Self-Co-Training of Audio and Video Representations from Self-Supervised Temporal Synchronization Supervised Temporal Synchronization , 2018 .
[57] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[58] Xiao Liu,et al. Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification , 2017, ArXiv.