Active learning for classifying long‐duration audio recordings of the environment

1. This paper presents an active learning framework for the classification of one-minute audio-recordings derived from long-duration recordings of the environment. The goal of the framework is to investigate the efficacy of active learning on reducing the manual annotation effort required to label a large volume of acoustic data according to its dominant sound source, while ensuring the high quality of automatically labelled data. 2. We present a comprehensive empirical comparison through extensive simulation experiments of a range of active learning approaches against a Random Sampling baseline for soundscape classification. Random Forest is used as a benchmark supervised approach to build classifiers in the active learning framework. Also, twelve summary indices extracted for each one minute of 13-month recording are used as features for training the classifiers. 3. Our experimental findings demonstrate that (1) among existing query strategies, those based on classifier confidence and diversity of samples are more effective for very large datasets where the classes are imbalanced in size; (2) by considering a practical target performance (i.e., F-measure equal or greater than 0.8, 0.85, and 0.9) for active learning, only 5-16 hours of manual annotation effort is required to build a classifier that automatically annotates a large amount (13 months) of unlabelled audio data. 4. Active learning has a key role to play in alleviating the burden of manual annotation required to build classifiers which can support effective monitoring of species diversity in at-risk ecosystems.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Paul Roe,et al.  Sampling environmental acoustic recordings to determine bird species richness. , 2013, Ecological applications : a publication of the Ecological Society of America.

[3]  Paul Roe,et al.  Practical Analysis of Big Acoustic Sensor Data for Environmental Monitoring , 2014, 2014 IEEE Fourth International Conference on Big Data and Cloud Computing.

[4]  Almo Farina,et al.  A new methodology to infer the singing activity of an avian community: The Acoustic Complexity Index (ACI) , 2011 .

[5]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[6]  Tuomas Virtanen,et al.  Active learning for sound event classification by clustering unlabeled data , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Philip K. McKinley,et al.  Ensemble extraction for classification and detection of bird species , 2010, Ecol. Informatics.

[8]  Dan Stowell,et al.  Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[9]  Anthony N. Nguyen,et al.  Clinical information extraction using small data: An active learning approach based on sequence representations and word embeddings , 2017, J. Assoc. Inf. Sci. Technol..

[10]  Anthony N. Nguyen,et al.  Active learning: a step towards automating medical concept extraction , 2015, J. Am. Medical Informatics Assoc..

[11]  I. Potamitis Automatic Classification of a Taxon-Rich Community Recorded in the Wild , 2014, PloS one.

[12]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[13]  Gábor Fodor The Ninth Annual MLSP Competition: First place , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[14]  Eduardo Coutinho,et al.  Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments , 2016, PloS one.

[15]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[16]  Eric P. Kasten,et al.  The remote environmental assessment laboratory's acoustic library: An archive for studying soundscape ecology , 2012, Ecol. Informatics.

[17]  Sarah L. Dumyahn,et al.  What is soundscape ecology? An introduction and overview of an emerging new science , 2011, Landscape Ecology.

[18]  Dilek Z. Hakkani-Tür,et al.  Active learning: theory and applications to automatic speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[19]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[20]  Paul Roe,et al.  Revealing the ecological content of long-duration audio-recordings of the environment through clustering and visualisation , 2018, PloS one.

[21]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[22]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[23]  Paul Roe,et al.  The use of acoustic indices to determine avian species richness in audio-recordings of the environment , 2014, Ecol. Informatics.

[24]  Anthony N. Nguyen,et al.  External Knowledge and Query Strategies in Active Learning: a Study in Clinical Information Extraction , 2015, CIKM.

[25]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[26]  Geoff Holmes,et al.  MEKA: A Multi-label/Multi-target Extension to WEKA , 2016, J. Mach. Learn. Res..

[27]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  Mark D. Plumbley,et al.  Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[29]  Björn Schuller,et al.  Active learning for bird sound classification via a kernel-based extreme learning machine. , 2017, The Journal of the Acoustical Society of America.

[30]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[31]  Almo Farina,et al.  Ecoacoustics: The Ecological Role of Sounds , 2017 .

[32]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[33]  Dan Stowell,et al.  Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning , 2014, PeerJ.

[34]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[35]  Xiaoli Z. Fern,et al.  Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. , 2012, The Journal of the Acoustical Society of America.

[36]  C. Clark,et al.  Passive acoustic monitoring on the North Atlantic right whale calving grounds , 2012 .

[37]  Björn W. Schuller,et al.  Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition , 2012, INTERSPEECH.

[38]  Michael Towsey Noise removal from wave-forms and spectrograms derived from natural recordings of the environment , 2013 .

[39]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[40]  Rong Yan,et al.  Extreme video retrieval: joint maximization of human and computer performance , 2006, MM '06.