Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data
暂无分享,去创建一个
Yong Xu | Mark D. Plumbley | Qiuqiang Kong | Wenwu Wang | Iwona Sobieraj | Yong Xu | Qiuqiang Kong | Wenwu Wang | I. Sobieraj
[1] Hiroshi Sawada,et al. Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[2] Mark B. Sandler,et al. Automatic Tagging Using Deep Convolutional Neural Networks , 2016, ISMIR.
[3] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[4] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[5] Daniel P. W. Ellis,et al. General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline , 2018, DCASE.
[6] Heikki Huttunen,et al. Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] DeLiang Wang,et al. Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8] Daniel P. W. Ellis,et al. Detecting Alarm Sounds , 2001 .
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Yi-Hsuan Yang,et al. Weakly-supervised audio event detection using event-specific Gaussian filters and fully convolutional networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Janto Skowronek,et al. Automatic surveillance of the acoustic activity in our living environment , 2005, 2005 IEEE International Conference on Multimedia and Expo.
[12] Augusto Sarti,et al. Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.
[13] Yong Xu,et al. A joint detection-classification model for audio tagging of weakly labelled data , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[15] M. Hariharan,et al. Automatic classification of infant cry: A review , 2012, 2012 International Conference on Biomedical Engineering (ICoBE).
[16] Ali Borji,et al. Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.
[17] Richard M. Dansereau,et al. Single-Channel Speech Separation Using Soft Mask Filtering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[18] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[19] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[20] Tuomas Virtanen,et al. TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).
[21] Qiang Huang,et al. Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[22] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[23] Annamaria Mesaros,et al. Metrics for Polyphonic Sound Event Detection , 2016 .
[24] Yong Xu,et al. Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Bhiksha Raj,et al. Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data , 2017, ArXiv.
[26] J. Hanley,et al. The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.
[27] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[28] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[29] Qiang Huang,et al. Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging , 2017, INTERSPEECH.
[30] Bhiksha Raj,et al. Audio Event Detection using Weakly Labeled Data , 2016, ACM Multimedia.
[31] Florian Metze,et al. Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection , 2017, INTERSPEECH.
[32] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[33] Kyogu Lee,et al. Ensemble of Convolutional Neural Networks for Weakly-supervised Sound Event Detection Using Multiple Scale Input , 2017, DCASE.
[34] Dan Stowell,et al. Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[35] Trevor Darrell,et al. Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[36] Tao Xiang,et al. In Defence of Negative Mining for Annotating Weakly Labelled Data , 2012, ECCV.
[37] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[38] Justin Salamon,et al. Adaptive Pooling Operators for Weakly Labeled Sound Event Detection , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[39] Qiang Chen,et al. Network In Network , 2013, ICLR.
[40] Mark D. Plumbley,et al. Deep Neural Network Baseline for DCASE Challenge 2016 , 2016, DCASE.
[41] Yong Xu,et al. A Joint Separation-Classification Model for Sound Event Detection of Weakly Labelled Data , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Yong Xu,et al. Audio Set Classification with Attention Model: A Probabilistic Perspective , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Tomás Lozano-Pérez,et al. A Framework for Multiple-Instance Learning , 1997, NIPS.
[45] VirtanenTuomas,et al. Detection and Classification of Acoustic Scenes and Events , 2018 .
[46] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[47] Fei-Fei Li,et al. Discriminative Segment Annotation in Weakly Labeled Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[48] Tuomas Virtanen,et al. A multi-device dataset for urban acoustic scene classification , 2018, DCASE.
[49] DeLiang Wang,et al. Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[50] Florian Metze,et al. Multiple Instance Deep Learning for Weakly Supervised Audio Event Detection , 2017, ArXiv.
[51] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Tuomas Virtanen,et al. Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network , 2017, ArXiv.
[53] Jyh-Shing Roger Jang,et al. FRAMECNN : A WEAKLY-SUPERVISED LEARNING FRAMEWORK FOR FRAME-WISE ACOUSTIC EVENT DETECTION AND CLASSIFICATION , 2017 .
[54] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[55] Yong Xu,et al. DCASE 2018 Challenge baseline with convolutional neural networks , 2018, ArXiv.
[56] Christoph H. Lampert,et al. Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.
[57] Thomas Hofmann,et al. Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.
[58] Yi Yang,et al. A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Zhi-Hua Zhou,et al. Neural Networks for Multi-Instance Learning , 2002 .
[60] Ankit Shah,et al. DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.
[61] Jon Barker,et al. Chime-home: A dataset for sound source recognition in a domestic environment , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[62] Qiang Huang,et al. Convolutional gated recurrent neural network incorporating spatial features for audio tagging , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[63] Thomas G. Dietterich,et al. Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..
[64] Jaume Amores,et al. Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..