Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
暂无分享,去创建一个
Andrew Owens | Jiajun Wu | Antonio Torralba | William T. Freeman | Josh H. McDermott | A. Torralba | W. Freeman | Andrew Owens | Jiajun Wu
[1] William W. Gaver. What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .
[2] Virginia R. de Sa,et al. Learning Classification with Unlabeled Data , 1993, NIPS.
[3] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.
[4] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[5] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[6] Malcolm Slaney,et al. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.
[7] Jitendra Malik,et al. Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.
[8] Michael Gasser,et al. The Development of Embodied Cognition: Six Lessons from Babies , 2005, Artificial Life.
[9] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[10] Vesa T. Peltonen,et al. Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[11] George Loizou,et al. Computer vision and pattern recognition , 2007, Int. J. Comput. Math..
[12] Antonio Torralba,et al. Spectral Hashing , 2008, NIPS.
[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[14] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[15] Geoffrey E. Hinton,et al. Semantic hashing , 2009, Int. J. Approx. Reason..
[16] Hossein Mobahi,et al. Deep learning from temporal coherence in video , 2009, ICML '09.
[17] Daniel P. W. Ellis,et al. Detecting local semantic concepts in environmental sounds using Markov model based clustering , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[19] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[20] Eero P. Simoncelli,et al. Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .
[21] Daniel P. W. Ellis,et al. Classifying soundtracks with audio texture features , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..
[23] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[24] Marc'Aurelio Ranzato,et al. Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[25] Jeff A. Bilmes,et al. Deep Canonical Correlation Analysis , 2013, ICML.
[26] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[27] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[28] Thomas Brox,et al. Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.
[29] Jonathan Tompson,et al. Unsupervised Feature Learning from Temporal Data , 2015, ICLR.
[30] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[31] Edward H. Adelson,et al. Learning visual groups from co-occurrences in space and time , 2015, ArXiv.
[32] Nitish Srivastava. Unsupervised Learning of Visual Representations using Videos , 2015 .
[33] Abhinav Gupta,et al. Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[34] Phillip Isola. The Discovery of perceptual structure from visual co-occurrences in space and time , 2015 .
[35] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[36] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.
[37] David A. Shamma,et al. The New Data and New Challenges in Multimedia Research , 2015, ArXiv.
[38] Kristen Grauman,et al. Learning Image Representations Tied to Ego-Motion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[39] Jitendra Malik,et al. Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[40] Ivan Laptev,et al. Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[42] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Jitendra Malik,et al. Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[45] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.
[46] Dorothy Wilson. The Discovery of Perceptual Structure from Visual Co-occurrences in Space and Time , 2016 .
[47] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[48] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[49] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Trevor Darrell,et al. Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.
[51] Trevor Darrell,et al. Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Alexei A. Efros,et al. Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Andrew Zisserman,et al. Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[55] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[57] V. D. Sa. Minimizing Disagreement for Self-Supervised Classification , 2022 .