Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments

Coping with scarcity of labeled data is a common problem in sound classification tasks. Approaches for classifying sounds are commonly based on supervised learning algorithms, which require labeled data which is often scarce and leads to models that do not generalize well. In this paper, we make an efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training. The proposed method pre-processes the instances that are ready for labeling by calculating their classifier confidence scores, and then delivers the candidates with lower scores to human annotators, and those with high scores are automatically labeled by the machine. We demonstrate the feasibility and efficacy of this method in two practical scenarios: pool-based and stream-based processing. Extensive experimental results indicate that our approach requires significantly less labeled instances to reach the same performance in both scenarios compared to Passive Learning, Active Learning and Self-Training. A reduction of 52.2% in human labeled instances is achieved in both of the pool-based and stream-based scenarios on a sound classification task considering 16,930 sound instances.

[1]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[2]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[3]  Björn W. Schuller,et al.  AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Chang-Hong Lin,et al.  Gabor-Based Nonuniform Scale-Frequency Map for Environmental Sound Classification in Home Automation , 2014, IEEE Transactions on Automation Science and Engineering.

[6]  Zhi-Hua Zhou,et al.  Exploiting Unlabeled Data in Content-Based Image Retrieval , 2004, ECML.

[7]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[8]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[9]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[10]  Robert D. Nowak,et al.  Noisy Generalized Binary Search , 2009, NIPS.

[11]  Jordi Janer,et al.  Active learning of custom sound taxonomies in unstructured audio data , 2012, ICMR '12.

[12]  Ishwar K. Sethi,et al.  Confidence-based active learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Karol J. Piczak ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[16]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[17]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[18]  Björn W. Schuller,et al.  Semi-supervised learning helps in sound event classification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Udo Hahn,et al.  Semi-Supervised Active Learning for Sequence Labeling , 2009, ACL.

[20]  Justin Salamon,et al.  Unsupervised feature learning for urban sound classification , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[22]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[23]  Roddy Cowie,et al.  AVEC 2012: the continuous audio/visual emotion challenge - an introduction , 2012, ICMI.

[24]  Paul Roe,et al.  A survey of tagging techniques for music, speech and environmental sound , 2012, Artificial Intelligence Review.

[25]  Nguyen Cong Phuong,et al.  Sound classification for event detection: Application into medical telemonitoring , 2013, 2013 International Conference on Computing, Management and Telecommunications (ComManTel).

[26]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[27]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[28]  Björn W. Schuller,et al.  Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition , 2012, INTERSPEECH.

[29]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[30]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[31]  Brian G. Ferguson,et al.  Acoustic cueing for surveillance and security applications , 2006, SPIE Defense + Commercial Sensing.

[32]  Huy Phan,et al.  Audio phrases for audio event recognition , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[33]  Jing Huang,et al.  Multi-View and Multi-Objective Semi-Supervised Learning for HMM-Based Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Ching-Yung Lin,et al.  Healthcare audio event classification using Hidden Markov Models and Hierarchical Hidden Markov Models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[35]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[36]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[37]  Virginia R. de Sa,et al.  Learning Classification with Unlabeled Data , 1993, NIPS.

[38]  Feng Jin,et al.  New approaches for spectro-temporal feature extraction with applications to respiratory sound classification , 2014, Neurocomputing.

[39]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[40]  Ross D. King,et al.  Active Learning for Regression Based on Query by Committee , 2007, IDEAL.

[41]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[42]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[43]  Nicolai Petkov,et al.  Reliable detection of audio events in highly noisy environments , 2015, Pattern Recognit. Lett..

[44]  Dilek Z. Hakkani-Tür,et al.  Active learning: theory and applications to automatic speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[45]  Haizhou Li,et al.  Probabilistic distance SVM with Hellinger-Exponential Kernel for sound event classification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46]  Israel Gannot,et al.  Fall detection of elderly through floor vibrations and sound , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[47]  A. Fleury,et al.  Sound and speech detection and classification in a Health Smart Home , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[48]  Takumi Kobayashi,et al.  Robust acoustic feature extraction for sound classification based on noise reduction , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[50]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.