论文信息 - Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)

Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)

This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task, and evaluates the performance of a baseline system in the task. As in previous years of the challenge, the task is defined for classification of short audio samples into one of predefined acoustic scene classes, using a supervised, closedset classification setup. The newly recorded TUT Urban Acoustic Scenes 2018 dataset consists of ten different acoustic scenes and was recorded in six large European cities, therefore it has a higher acoustic variability than the previous datasets used for this task, and in addition to high-quality binaural recordings, it also includes data recorded with mobile devices. We also present the baseline system consisting of a convolutional neural network and its performance in the subtasks using the recommended cross-validation setup.

[1] T. Lidy,et al. A Multi-modal Deep Neural Network approach to Bird-song Identication , 2018, CLEF.

[2] Hervé Glotin,et al. Overview of LifeCLEF 2018: A Large-Scale Evaluation of Species Identification and Recommendation Algorithms in the Era of AI , 2018, CLEF.

[3] Yunpeng Li,et al. Bioacoustic detection with wavelet-conditioned convolutional neural networks , 2018, Neural Computing and Applications.

[4] Nicolas Turpault,et al. Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments , 2018, DCASE.

[5] Daniel P. W. Ellis,et al. General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline , 2018, DCASE.

[6] Tuomas Virtanen,et al. A multi-device dataset for urban acoustic scene classification , 2018, DCASE.

[7] Dan Stowell,et al. Data-efficient weakly supervised learning for low-resource audio event detection using deep learning , 2018, DCASE.

[8] Hervé Glotin,et al. Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge , 2018, Methods in Ecology and Evolution.

[9] Qiuqiang Kong,et al. Audio Tagging With Connectionist Temporal Classification Model Using Sequential Labelled Data , 2018, CSPS.

[10] Xavier Serra,et al. A Simple Fusion of Deep and Shallow Learning for Acoustic Scene Classification , 2018, ArXiv.

[11] Yan Zhou,et al. Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network , 2018, PKAW.

[12] Haibo Mi,et al. Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network , 2018, PCM.

[13] Simone Orcioni,et al. A Preliminary Study of Sounds Emitted by Honey Bees in a Beehive , 2018 .

[14] Firoj Alam,et al. Domain Adaptation with Adversarial Training and Graph Embeddings , 2018, ACL.

[15] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.

[16] Valery Naranjo,et al. EvoDeep: A new evolutionary approach for automatic Deep Neural Networks parametrisation , 2018, J. Parallel Distributed Comput..

[17] Jianmin Wang,et al. Multi-Adversarial Domain Adaptation , 2018, AAAI.

[18] Justin Salamon,et al. Adaptive Pooling Operators for Weakly Labeled Sound Event Detection , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19] Hye-jin Shim,et al. A Complete End-to-End Speaker Verification System Using Deep Neural Networks: From Raw Signals to Verification Result , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Vincent Lostanlen,et al. Birdvox-Full-Night: A Dataset and Benchmark for Avian Flight Call Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21] Zhiyao Duan,et al. Visualization and Interpretation of Siamese Style Convolutional Neural Networks for Sound Search by Vocal Imitation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22] Philip J. B. Jackson,et al. Robust Full-Sphere Binaural Sound Source Localization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23] Yong Xu,et al. Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24] Kun Qian,et al. Deep Scalogram Representations for Acoustic Scene Classification , 2018, IEEE/CAA Journal of Automatica Sinica.

[25] Florian Metze,et al. Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks , 2018, INTERSPEECH.

[26] Christian Igel,et al. Robust Active Label Correction , 2018, AISTATS.

[27] Feng Liu,et al. Learning Environmental Sounds with Multi-scale Convolutional Neural Network , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[28] Hao Wu,et al. Semi-Supervised Deep Learning Using Pseudo Labels for Hyperspectral Image Classification , 2018, IEEE Transactions on Image Processing.

[29] Mark B. Sandler,et al. Similarity Measures for Vocal-Based Drum Sample Retrieval Using Deep Convolutional Auto-Encoders , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30] Mathieu Lagrange,et al. Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31] Albert Gordo,et al. Transparent Model Distillation , 2018, ArXiv.

[32] Petros Maragos,et al. On the Joint Use of NMF and Classification for Overlapping Acoustic Event Detection , 2018, IWCIM@EUSIPCO.

[33] Bhiksha Raj,et al. DCASE 2017 Task 1: Acoustic Scene Classification Using Shift-Invariant Kernels and Random Features , 2018, DCASE.

[34] Carlos Eric Galván-Tejada,et al. Frequency Analysis of Honey Bee Buzz for Automatic Recognition of Health Status: A Preliminary Study , 2017, Res. Comput. Sci..

[35] Li Fei-Fei,et al. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[36] Geoffrey E. Hinton,et al. Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[37] Sangeun Kum,et al. Combining Multi-Scale Features Using Sample-Level Deep Convolutional Neural Networks for Weakly Supervised Sound Event Detection , 2017, DCASE.

[38] Ankit Shah,et al. DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[39] Stephen J. Roberts,et al. Mosquito detection with low-cost smartphones: data acquisition for malaria research , 2017, ArXiv.

[40] Ilyas Potamitis,et al. Deep Networks tag the location of bird vocalisations on audio spectrograms , 2017, ArXiv.

[41] Aren Jansen,et al. Unsupervised Learning of Semantic Audio Representations , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42] Anurag Kumar,et al. Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43] Yong Xu,et al. Audio Set Classification with Attention Model: A Probabilistic Perspective , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44] Tomoki Toda,et al. Duration-Controlled LSTM for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[45] Marian Verhelst,et al. The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network , 2017, DCASE.

[46] Hossein Sameti,et al. SUT Submission for NIST 2016 Speaker Recognition Evaluation: Description and Analysis , 2017, ROCLING/IJCLCLP.

[47] Mark D. Plumbley,et al. Neuroevolution for sound event detection in real life audio: A pilot study , 2017 .

[48] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[50] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[51] Dong Liu,et al. Adaptive Pooling in Multi-instance Learning for Web Video Annotation , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[52] T. Virtanen,et al. Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network , 2017, DCASE.

[53] Tuomas Virtanen,et al. A report on sound event detection with different binaural features , 2017, ArXiv.

[54] Mark D. Plumbley,et al. Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[55] Iván V. Meza,et al. Localization of sound sources in robotics: A review , 2017, Robotics Auton. Syst..

[56] Zhiyao Duan,et al. IMINET: Convolutional semi-siamese networks for sound search by vocal imitation , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[57] Mark D. Plumbley,et al. Computational Analysis of Sound Scenes and Events , 2017 .

[58] Biao Leng,et al. A Multi-level Weighted Representation for Person Re-identification , 2017, ICANN.

[59] Yu Zhang,et al. Training RNNs as Fast as CNNs , 2017, EMNLP 2018.

[60] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61] Yong Xu,et al. Surrey-cvssp system for DCASE2017 challenge task4 , 2017, ArXiv.

[62] Nicholas W. D. Evans,et al. Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification , 2017, Comput. Speech Lang..

[63] Tiejun Zhao,et al. Automatic Dataset Augmentation , 2017, ArXiv.

[64] Franz Pernkopf,et al. Virtual Adversarial Training and Data Augmentation for Acoustic Event Detection with Gated Recurrent Neural Networks , 2017, INTERSPEECH.

[65] Yi Yang,et al. Random Erasing Data Augmentation , 2017, AAAI.

[66] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[67] Thomas Grill,et al. Two convolutional neural networks for bird detection in audio signals , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[68] Bhiksha Raj,et al. Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data , 2017, ArXiv.

[69] Shuicheng Yan,et al. Dual Path Networks , 2017, NIPS.

[70] Muhammad Huzaifah,et al. Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks , 2017, ArXiv.

[71] Gerhard Widmer,et al. A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[72] Juho Kim,et al. Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras , 2017, ArXiv.

[73] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[74] Tuomas Virtanen,et al. Stacked convolutional and recurrent neural networks for bird audio detection , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[75] Stephen J. Roberts,et al. Mosquito Detection with Neural Networks: The Buzz of Deep Learning , 2017, ArXiv.

[76] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77] L. Harrington,et al. The Impact of Temperature and Body Size on Fundamental Flight Tone Variation in the Mosquito Vector Aedes aegypti (Diptera: Culicidae): Implications for Acoustic Lures , 2017, Journal of Medical Entomology.

[78] Tuomas Virtanen,et al. Convolutional recurrent neural networks for bird audio detection , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[79] Juhan Nam,et al. Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging , 2017, IEEE Signal Processing Letters.

[80] Juhan Nam,et al. Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms , 2017, ArXiv.

[81] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[82] Quoc V. Le,et al. Large-Scale Evolution of Image Classifiers , 2017, ICML.

[83] Tatsuya Harada,et al. Learning environmental sounds with end-to-end convolutional neural network , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[84] Woon-Seng Gan,et al. Fast HRFT measurement system with unconstrained head movements for 3D audio in virtual and augmented reality applications , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[85] Florian Metze,et al. A comparison of Deep Learning methods for environmental sound detection , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[86] Roberto Togneri,et al. Enhanced LBP texture features from time frequency representations for acoustic scene classification , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[87] Qiang Huang,et al. Convolutional gated recurrent neural network incorporating spatial features for audio tagging , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[88] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[90] Fathi M. Salem,et al. Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[91] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.

[92] Guillaume Lemaitre,et al. Vocal Imitations of Non-Vocal Sounds , 2016, PloS one.

[93] Joachim Denzler,et al. ImageNet pre-trained models with batch normalization , 2016, ArXiv.

[94] Michael T. Johnson and Patrick J. Clemins. Hidden Markov Model Signal Classification , 2016 .

[95] Sanjeev Khudanpur,et al. Deep neural network-based speaker embeddings for end-to-end speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[96] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[97] Susan L. Denham,et al. Computational Models of Auditory Scene Analysis: A Review , 2016, Front. Neurosci..

[98] Bhiksha Raj,et al. Audio event and scene recognition: A unified approach using strongly and weakly labeled data , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[99] Björn W. Schuller,et al. The University of Passau Open Emotion Recognition System for the Multimodal Emotion Challenge , 2016, CCPR.

[100] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[101] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[102] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[103] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .

[104] Yong Xu,et al. A joint detection-classification model for audio tagging of weakly labelled data , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[105] Davide Rocchesso,et al. Innovative Tools for Sound Sketching Combining Vocalizations and Gestures , 2016, Audio Mostly Conference.

[106] Wei Dai,et al. Very deep convolutional neural networks for raw waveforms , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[107] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[108] Bhiksha Raj,et al. An approach for self-training audio event detectors using web data , 2016, 2017 25th European Signal Processing Conference (EUSIPCO).

[109] Richard Nock,et al. Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[110] Phillip M. Stepanian,et al. Extending bioacoustic monitoring of birds aloft through flight call localization with a three‐dimensional microphone array , 2016, Ecology and evolution.

[111] Mark Sandler,et al. Towards a comprehensive dataset of vocal imitations of drum sounds , 2016 .

[112] Sebastian Tschiatschek,et al. Virtual Adversarial Training Applied to Neural Higher-Order Factors for Phone Classification , 2016, INTERSPEECH.

[113] Nicholas W. D. Evans,et al. The open-set problem in acoustic scene classification , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).

[114] S. Sovannaroth,et al. Cow-baited tents are highly effective in sampling diverse Anopheles malaria vectors in Cambodia , 2016, Malaria Journal.

[115] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[116] George Trigeorgis,et al. Domain Separation Networks , 2016, NIPS.

[117] Woon-Seng Gan,et al. Fast Continuous Acquisition of HRTF for Human Subjects with Unconstrained Random Head Movements in Azimuth and Elevation , 2016 .

[118] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[119] Carlo Drioli,et al. Organizing a sonic space through vocal imitations , 2016 .

[120] YICHI ZHANG,et al. Supervised and Unsupervised Sound Retrieval by Vocal Imitation , 2016 .

[121] Hervé Glotin,et al. Bird detection in audio: A survey and a challenge , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[122] Tuomas Virtanen,et al. TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[123] E. Stalidzans,et al. Remote detection of the swarming of honey bee colonies by single-point temperature monitoring , 2016 .

[124] Ilyas Potamitis,et al. Measuring the fundamental frequency and the harmonic properties of the wingbeat of a large number of mosquitoes in flight using 2D optoacoustic sensors , 2016 .

[125] Kenneth O. Stanley,et al. Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks , 2016, GECCO.

[126] Qiang Huang,et al. Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[127] Huy Phan,et al. CNN-LTE: a Class of 1-X Pooling Convolutional Neural Networks on Label Tree Embeddings for Audio Scene Recognition , 2016 .

[128] Huy Phan,et al. CaR-FOREST: Joint Classification-Regression Decision Forests for Overlapping Audio Event Detection , 2016, ArXiv.

[129] Qiang Huang,et al. Fully DNN-Based Multi-Label Regression for Audio Tagging , 2016, DCASE.

[130] David Pfau,et al. Convolution by Evolution: Differentiable Pattern Producing Networks , 2016, GECCO.

[131] Qin Jin,et al. Video Description Generation using Audio and Visual Cues , 2016, ICMR.

[132] Mark B. Sandler,et al. Automatic Tagging Using Deep Convolutional Neural Networks , 2016, ISMIR.

[133] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[134] Annamaria Mesaros,et al. Metrics for Polyphonic Sound Event Detection , 2016 .

[135] Andrew M. Dai,et al. Virtual Adversarial Training for Semi-Supervised Text Classification , 2016, ArXiv.

[136] Florian Krebs,et al. madmom: A New Python Audio and Music Signal Processing Library , 2016, ACM Multimedia.

[137] Bhiksha Raj,et al. Audio Event Detection using Weakly Labeled Data , 2016, ACM Multimedia.

[138] Luc Van Gool,et al. Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection , 2016, ArXiv.

[139] Huy Phan,et al. Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks , 2016, INTERSPEECH.

[140] Pascal Fua,et al. Beyond Sharing Weights for Deep Domain Adaptation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[141] Zhiyao Duan,et al. IMISOUND: An Unsupervised System for Sound Query by Vocal Imitation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[142] Heikki Huttunen,et al. Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[143] Kong-Aik Lee,et al. An extensible speaker identification sidekit in Python , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[144] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[145] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[146] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[147] Rainer Martin,et al. Optimization of amplitude modulation features for low-resource acoustic scene classification , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[148] Bowen Zhou,et al. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[149] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[150] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[151] Heikki Huttunen,et al. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[152] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[153] Jon Barker,et al. Chime-home: A dataset for sound source recognition in a domestic environment , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[154] Saurabh Singh,et al. Where to Look: Focus Regions for Visual Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[155] Wei Xu,et al. ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering , 2015, ArXiv.

[156] Kate Saenko,et al. Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[157] Ruslan Salakhutdinov,et al. Action Recognition using Visual Attention , 2015, NIPS 2015.

[158] Karol J. Piczak. Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[159] Tieniu Tan,et al. A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[160] John H. L. Hansen,et al. Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[161] Karol J. Piczak. ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.

[162] Zhiyao Duan,et al. Retrieving sounds by vocal imitation recognition , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[163] Alain Rakotomamonjy,et al. Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015, ArXiv.

[164] Germán Castellanos-Domínguez,et al. Multiple Instance Learning-Based Birdsong Classification Using Unsupervised Recording Segmentation , 2015, IJCAI.

[165] Heikki Huttunen,et al. Polyphonic sound event detection using multi label deep neural networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[166] Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[167] Ronald M. Summers,et al. DeepOrgan: Multi-level Deep Convolutional Networks for Automated Pancreas Segmentation , 2015, MICCAI.

[168] Dan Stowell,et al. Detection and Classification of Acoustic Scenes and Events , 2015, IEEE Transactions on Multimedia.

[169] Dimitri Palaz,et al. Convolutional Neural Networks-based continuous speech recognition using raw speech signal , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[170] Justin Salamon,et al. Unsupervised feature learning for urban sound classification , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[171] Onur Dikmen,et al. Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[172] Yan Song,et al. Robust sound event recognition using convolutional neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[173] Guy J. Brown,et al. A machine-hearing system exploiting head movements for binaural sound localisation in reverberant conditions , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[174] Jon Froehlich,et al. Head-Mounted Display Visualizations to Support Sound Awareness for the Deaf and Hard of Hearing , 2015, CHI.

[175] Bryan Pardo,et al. VocalSketch: Vocally Imitating Audio Concepts , 2015, CHI.

[176] Young Man Ko,et al. Collective Archiving of Soundscapes in Socio-Cultural Context , 2015 .

[177] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[178] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[179] Minyoung Kim,et al. Deep Clustered Convolutional Kernels , 2015, FE@NIPS.

[180] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[181] Ee-Leng Tan,et al. Natural Sound Rendering for Headphones: Integration of signal processing techniques , 2015, IEEE Signal Processing Magazine.

[182] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[183] Durand R. Begault,et al. Inter-Laboratory Round Robin HRTF Measurement Comparison , 2015, IEEE Journal of Selected Topics in Signal Processing.

[184] Davide Rocchesso,et al. Sketching sound with voice and gesture , 2015, Interactions.

[185] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[186] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[187] Francis Rumsey,et al. Spatial Audio: Binaural Challenges , 2014 .

[188] Dan Stowell,et al. Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[189] Thushara D. Abhayapala,et al. Binaural localization of speech sources in the median plane using cepstral hrtf extraction , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[190] Bryan Pardo,et al. SynthAssist: an audio synthesizer programmed with vocal imitation , 2014, ACM Multimedia.

[191] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[192] Luís A. Alexandre,et al. Weighted Convolutional Neural Network Ensemble , 2014, CIARP.

[193] James E. Allen,et al. Highly evolvable malaria vectors: The genomes of 16 Anopheles mosquitoes , 2014, Science.

[194] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[195] Vittorio Murino,et al. Audio Surveillance: a Systematic Review , 2014 .

[196] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[197] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[198] Markus Schedl,et al. Music Information Retrieval: Recent Developments and Applications , 2014, Found. Trends Inf. Retr..

[199] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[200] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[201] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[202] Florian Metze,et al. Improved audio features for large-scale multimedia event detection , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[203] Aaron C. Courville,et al. Generative Adversarial Nets , 2014, NIPS.

[204] Joan Bruna,et al. Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[205] Rob Fergus,et al. Learning from Noisy Labels with Deep Neural Networks , 2014, ICLR.

[206] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[207] M. Verleysen,et al. Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[208] Yang Song,et al. Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[209] Wen Zhang,et al. Binaural sound source localization using the frequency diversity of the head-related transfer function. , 2014, The Journal of the Acoustical Society of America.

[210] Jordi Janer,et al. Sound Retrieval From Voice Imitation Queries In Collaborative Databases , 2014, Semantic Audio.

[211] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[212] Qiang Chen,et al. Network In Network , 2013, ICLR.

[213] Tinne Tuytelaars,et al. Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[214] Björn W. Schuller,et al. Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[215] Xavier Serra,et al. Freesound technical demo , 2013, ACM Multimedia.

[216] Björn W. Schuller,et al. Large-scale audio feature extraction and SVM for acoustic scene classification , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[217] Flemming Christensen,et al. Insert earphone calibration for hear-through options , 2013 .

[218] Dong Yu,et al. Exploring convolutional neural network structures and optimization techniques for speech recognition , 2013, INTERSPEECH.

[219] Jaume Amores,et al. Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..

[220] Yongqiang Wang,et al. An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[221] Geoffrey Zweig,et al. Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[222] Julien Pinquier,et al. Water sound recognition based on physical models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[223] Arkady B. Zaslavsky,et al. Context Aware Computing for The Internet of Things: A Survey , 2013, IEEE Communications Surveys & Tutorials.

[224] G. Killeen,et al. Standardizing operational vector sampling techniques for measuring malaria transmission intensity: evaluation of six mosquito collection methods in western Kenya , 2013, Malaria Journal.

[225] DeLiang Wang,et al. Binaural Detection, Localization, and Segregation in Reverberant Environments Based on Joint Pitch and Azimuth Cues , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[226] A. Mesaros,et al. Context-dependent sound event detection , 2013, EURASIP J. Audio Speech Music. Process..

[227] Mohamed S. Kamel,et al. Cross-Domain Facial Expression Recognition Using Supervised Kernel Mean Matching , 2012, 2012 11th International Conference on Machine Learning and Applications.

[228] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[229] Bryan Pardo,et al. Music/Voice Separation Using the Similarity Matrix , 2012, ISMIR.

[230] Hao Shen,et al. HRTF-based localization and separation of multiple sound sources , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[231] L. Deng,et al. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[232] Alexander Lerch,et al. An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics , 2012 .

[233] Catherine P. Ortega,et al. CHAPTER 2 EffECTS Of NOiSE POLLUTiON ON biRdS: A bRiEf REViEw Of OUR kNOwLEdgE , 2012 .

[234] Kilian Q. Weinberger,et al. Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[235] Xiaoli Z. Fern,et al. Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. , 2012, The Journal of the Acoustical Society of America.

[236] Stephan Gerlach,et al. Acoustic Monitoring and Localization for Social Care , 2012, J. Comput. Sci. Eng..

[237] Daniel Garcia-Romero,et al. Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[238] G. Lemaitre,et al. A lexical analysis of environmental sound categories. , 2012, Journal of experimental psychology. Applied.

[239] Björn W. Schuller,et al. Semi-supervised learning helps in sound event classification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[240] Yoshua Bengio,et al. Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[241] Ian H. Witten,et al. Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[242] Ruimin Hu,et al. Binaural Moving Sound Source Localization by Joint Estimation of ITD and ILD , 2011 .

[243] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[244] Sarah L. Dumyahn,et al. What is soundscape ecology? An introduction and overview of an emerging new science , 2011, Landscape Ecology.

[245] Volker Hohmann,et al. Auditory model based direction estimation of concurrent speakers from binaural signals , 2011, Speech Commun..

[246] Andrei Lucian,et al. Original paper: Identification of the honey bee swarming process by analysing the time course of hive vibrations , 2011 .

[247] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[248] Juan Jose Gomez-Alfageme,et al. Estimation of the direction of auditory events in the median plane , 2010 .

[249] Anthony G. Cohn,et al. Discovering an Event Taxonomy from Video using Qualitative Spatio-temporal Graphs , 2010, ECAI.

[250] Yann LeCun,et al. Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[251] Tuomas Virtanen,et al. Acoustic event detection in real life recordings , 2010, 2010 18th European Signal Processing Conference.

[252] Zheng-Hua Tan,et al. Low-Complexity Variable Frame Rate Analysis for Speech Recognition and Voice Activity Detection , 2010, IEEE Journal of Selected Topics in Signal Processing.

[253] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[254] James R. Foulds,et al. A review of multi-instance learning assumptions , 2010, The Knowledge Engineering Review.

[255] Kuntoro Adi,et al. Acoustic censusing using automatic vocalization classification and identity recognition. , 2010, The Journal of the Acoustical Society of America.

[256] Ching-Yung Lin,et al. Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[257] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[258] Karsten M. Borgwardt,et al. Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[259] F. Keyrouz,et al. Real time humanoid sound source localization and tracking in a highly reverberant environment , 2008, 2008 9th International Conference on Signal Processing.

[260] D. Berckmans,et al. Monitoring of swarming sounds in bee hives for early detection of the swarming period , 2008 .

[261] Augusto Sarti,et al. Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[262] Kazuhiro Iida,et al. Median plane localization using a parametric model of the head-related transfer function based on spectral cues , 2007 .