Audio-Visual Model Distillation Using Acoustic Images
暂无分享,去创建一个
Vittorio Murino | Pietro Morerio | Andrés F. Pérez | Valentina Sanguineti | Vittorio Murino | Pietro Morerio | Valentina Sanguineti
[1] Luc Van Gool,et al. AENet: Learning Deep Audio Features for Video Analysis , 2017, IEEE Transactions on Multimedia.
[2] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[3] Patrick Pérez,et al. Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events , 2018, CVPR Workshops.
[4] M. Wallace,et al. Converging influences from visual, auditory, and somatosensory cortices onto output neurons of the superior colliculus. , 1993, Journal of neurophysiology.
[5] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Vittorio Murino,et al. Modality Distillation with Multiple Stream Networks for Action Recognition , 2018, ECCV.
[8] Roberto Arrighi,et al. Meaningful auditory information enhances perception of visual biological motion. , 2009, Journal of vision.
[9] Hiroko Terasawa,et al. A statistical model of timbre perception , 2006, SAPA@INTERSPEECH.
[10] Yoshua Bengio,et al. Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.
[11] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[12] Yaxin Bi,et al. KNN Model-Based Approach in Classification , 2003, OTM.
[13] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[14] Antonio Torralba,et al. Learning Aligned Cross-Modal Representations from Weakly Aligned Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Andrew Owens,et al. Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning , 2017, International Journal of Computer Vision.
[16] B.P. Yuhas,et al. Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.
[17] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Kristen Grauman,et al. 2.5D Visual Sound , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[20] Karol J. Piczak. ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.
[21] Tuomas Virtanen,et al. A multi-device dataset for urban acoustic scene classification , 2018, DCASE.
[22] M. Melamed. Detection , 2021, SETI: Astronomy as a Contact Sport.
[23] Bernhard Schölkopf,et al. Unifying distillation and privileged information , 2015, ICLR.
[24] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[25] Xinyu Li,et al. Multi-stream Network With Temporal Attention For Environmental Sound Classification , 2019, INTERSPEECH.
[26] Trevor Darrell,et al. Learning with Side Information through Modality Hallucination , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Shmuel Peleg,et al. Visual Speech Enhancement , 2017, INTERSPEECH.
[28] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Golubkov Alexander. ACOUSTIC SCENE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORKS AND DIFFERENT CHANNELS REPRESENTATIONS AND ITS FUSION Technical Report , 2018 .
[31] William W. Gaver. What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .
[32] Tae-Hyun Oh,et al. On Learning Association of Sound Source and Visual Scenes , 2018, CVPR Workshops.
[33] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[34] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[36] Yoshua Bengio,et al. Speaker Recognition from Raw Waveform with SincNet , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[37] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[38] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[39] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[40] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Virginia R. de Sa,et al. Learning Classification with Unlabeled Data , 1993, NIPS.
[42] Xinxing Chen,et al. ACOUSTIC SCENE CLASSIFICATION USING MULTI-SCALE FEATURES Technical Report , 2018 .
[43] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[44] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[45] Alessio Del Bue,et al. Seeing the Sound: A New Multimodal Imaging Device for Computer Vision , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).
[46] Antonio Torralba,et al. See, Hear, and Read: Deep Aligned Representations , 2017, ArXiv.
[47] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[48] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[49] Dmitriy Serdyuk,et al. Unsupervised adversarial domain adaptation for acoustic scene classification , 2018, ArXiv.