Self-supervised Audio-visual Co-segmentation
暂无分享,去创建一个
Chuang Gan | Antonio Torralba | Hang Zhao | Josh H. McDermott | Andrew Rouditchenko | A. Torralba | Chuang Gan | Hang Zhao | Andrew Rouditchenko
[1] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Josh H. McDermott. The cocktail party problem , 2009, Current Biology.
[3] G. Kramer. Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .
[4] Bolei Zhou,et al. Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] James R. Glass,et al. Learning modality-invariant representations for speech and images , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[6] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[7] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[8] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[9] J L Gallant,et al. Sparse coding and decorrelation in primary visual cortex during natural vision. , 2000, Science.
[10] Olaf Ronneberger,et al. Invited Talk: U-Net Convolutional Networks for Biomedical Image Segmentation , 2017, Bildverarbeitung für die Medizin.
[11] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.
[12] Yoshitaka Nakajima,et al. Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .
[13] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[14] Gregory Shakhnarovich,et al. Semantic Speech Retrieval With a Visually Grounded Model of Untranscribed Speech , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[15] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.
[16] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[17] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[18] Daniel P. W. Ellis,et al. MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.
[19] Maja Pantic,et al. Audio-visual object localization and separation using low-rank and sparsity , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[21] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[22] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[23] Grzegorz Chrupala,et al. Representations of language in a model of visually grounded speech signal , 2017, ACL.
[24] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[28] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.