Weakly Supervised Representation Learning for Audio-Visual Scene Analysis
暂无分享,去创建一个
Gaël Richard | Slim Essid | Sanjeel Parekh | Alexey Ozerov | Ngoc Q. K. Duong | Patrick Pérez | P. Pérez | S. Essid | G. Richard | A. Ozerov | Sanjeel Parekh
[1] Bhiksha Raj,et al. Audio Event Detection using Weakly Labeled Data , 2016, ACM Multimedia.
[2] Kyogu Lee,et al. Ensemble of Convolutional Neural Networks for Weakly-supervised Sound Event Detection Using Multiple Scale Input , 2017, DCASE.
[3] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[4] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[6] Frank Melchior,et al. Categorization of broadcast audio objects in complex auditory scenes , 2016 .
[7] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.
[8] Jiebo Luo,et al. Large-scale multimodal semantic concept detection for consumer video , 2007, MIR '07.
[9] Thomas G. Dietterich,et al. Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..
[10] Volker Gnann. SOURCE-FILTER BASED CLUSTERING FOR MONAURAL BLIND SOURCE SEPARATION , 2009 .
[11] Christoph H. Lampert,et al. Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.
[12] Tuomas Virtanen,et al. Sound event detection using spatial features and convolutional recurrent neural network , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[14] Thomas S. Huang,et al. Real-world acoustic event detection , 2010, Pattern Recognit. Lett..
[15] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[16] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[17] Gaël Richard,et al. Overlapping sound event detection with supervised Nonnegative Matrix Factorization , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Albert S. Bregman,et al. The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .
[19] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[20] Jeff A. Bilmes,et al. Deep Canonical Correlation Analysis , 2013, ICML.
[21] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[22] Mark D. Plumbley,et al. Weakly labelled AudioSet Classification with Attention Neural Networks. , 2019 .
[23] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .
[24] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[25] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Ivan Laptev,et al. ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.
[27] Thomas Deselaers,et al. Localizing Objects While Learning Their Appearance , 2010, ECCV.
[28] Yong Xu,et al. Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[29] Ming Yang,et al. Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.
[30] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.
[31] Bernt Schiele,et al. How good are detection proposals, really? , 2014, BMVC.
[32] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[33] Mubarak Shah,et al. Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects , 2013, IEEE Transactions on Multimedia.
[34] J. Salamon,et al. DCASE 2017 SUBMISSION : MULTIPLE INSTANCE LEARNING FOR SOUND EVENT DETECTION , 2017 .
[35] Onur Dikmen,et al. Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Jitendra Malik,et al. Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[37] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[38] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Luc Van Gool,et al. Object and Action Classification with Latent Window Parameters , 2013, International Journal of Computer Vision.
[40] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[42] Yong Jae Lee,et al. Weakly-supervised Discovery of Visual Pattern Configurations , 2014, NIPS.
[43] Birger Kollmeier,et al. On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’ , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[44] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[45] Cordelia Schmid,et al. Spatio-temporal Object Detection Proposals , 2014, ECCV.
[46] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[47] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.
[48] Cees Snoek,et al. APT: Action localization proposals from dense trajectories , 2015, BMVC.
[49] Bin Yang,et al. Multi-level attention model for weakly supervised audio classification , 2018, DCASE.
[50] Shrikanth S. Narayanan,et al. An Overview on Perceptually Motivated Audio Indexing and Classification , 2013, Proceedings of the IEEE.
[51] Anurag Kumar,et al. Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[52] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[53] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[54] Patrick Pérez,et al. Identify, Locate and Separate: Audio-Visual Object Extraction in Large Video Collections Using Weak Supervision , 2018, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[55] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[56] Cordelia Schmid,et al. Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[57] Andrea Vedaldi,et al. Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Emmanuel Vincent,et al. Single-channel audio source separation with NMF: divergences, constraints and algorithms , 2018 .
[59] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[60] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[61] Justin Salamon,et al. Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[62] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[63] Shih-Fu Chang,et al. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[64] Ankit Shah,et al. DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.
[65] Shih-Fu Chang,et al. Short-term audio-visual atoms for generic video concept classification , 2009, ACM Multimedia.
[66] T. Tuytelaars,et al. Weakly Supervised Object Detection with Posterior Regularization , 2014 .
[67] Ivan Laptev,et al. Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Florian Metze,et al. A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[69] Mubarak Shah,et al. High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.
[70] G. Kramer. Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .
[71] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[72] Yong Xu,et al. Surrey-cvssp system for DCASE2017 challenge task4 , 2017, ArXiv.
[73] Paul A. Viola,et al. Multiple Instance Boosting for Object Detection , 2005, NIPS.
[74] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.