Intelligent Audio Analysis for Continuous Rainforest Occupancy Monitoring

Auditory data is used by ecologists for a variety of purposes, including identifying species ranges, estimating population sizes, and studying behaviour. Autonomous recording units (ARUs) enable auditory data collection over a wider area and time frame, and can provide improved consistency over traditional sampling methods. Moreover, in enabling passive acoustic monitoring, they mitigate the impacts of field surveys on sensitive landscapes and species. The result is an abundance of audio data – much more than can be analyzed continuously by scientists with the appropriate taxonomic skills. In recent years, ecologists have begun collaborating with computer scientists to develop machine learning tools to better utilize this data. Deep learning methods are effective for species classification, but still face a number of challenges in processing ARU data, such as handing changing environmental conditions, differentiating between co-occurring animal sounds, and detecting vocalizations in long streams of audio. In this project, I examine and address the divide between academic machine learning research on animal vocalization classifiers, and their application to conservation efforts. As a unique case study, I build a Bornean gibbon (Hylobates muelleri) call detection system for the Stability of Altered Forest Ecosystems Project’s solarpowered, mobile-connected ARU network. First, I design a GUI-based method to rapidly annotate continuous audio for supervised learning. Next, I experiment with three fundamentally different forms of feature extraction and learning: end-to-end modeling, LLD extraction with Bag-of-Audio-Words modeling, and deep learning using 2D spectral representations. Last, I examine ways to adapt the models, and their outputs, for ecological occupancy monitoring. With a top ROC-AUC score of 0.9848, the resulting models perform highly on independent test data, and showcase the potential for machine learning to transform ecological monitoring, even in dynamic and biodiverse environments. More immediately, the project provides a new tool for endangered primate conservation and a foundation for extension to other rainforest species.

[1]  Lenore Fahrig,et al.  A large-scale forest fragmentation experiment: the Stability of Altered Forest Ecosystems Project , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[2]  Stefanos Zafeiriou,et al.  End2You - The Imperial Toolkit for Multimodal Profiling by End-to-End Learning , 2018, ArXiv.

[3]  Sarah L. Dumyahn,et al.  What is soundscape ecology? An introduction and overview of an emerging new science , 2011, Landscape Ecology.

[4]  Ilyas Potamitis,et al.  Deep Networks tag the location of bird vocalisations on audio spectrograms , 2017, ArXiv.

[5]  Mark D Skowronski,et al.  Acoustic detection and classification of Microchiroptera using machine learning: lessons learned from automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[6]  Klaus Zuberbühler,et al.  Context-specific close-range “hoo” calls in wild gibbons (Hylobates lar) , 2015, BMC Evolutionary Biology.

[7]  Ian Phillip Vaughan,et al.  Improving the Quality of Distribution Models for Conservation by Addressing Shortcomings in the Field Collection of Training Data , 2003 .

[8]  Hervé Glotin,et al.  Bird detection in audio: A survey and a challenge , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[9]  W. Turner Sensing biodiversity , 2014, Science.

[10]  P. Tyack,et al.  Estimating animal population density using passive acoustics , 2012, Biological reviews of the Cambridge Philosophical Society.

[11]  Hjalmar S. Kühl,et al.  Assessing the performance of a semi‐automated acoustic monitoring system for primates , 2015 .

[12]  N. Holmes,et al.  The breeding phenology and distribution of the Band-rumped Storm-petrel Oceanodroma castro on Kaua'i and Lehua Islet, Hawaiian Islands , 2017 .

[13]  Christopher W. Clark,et al.  SPATIAL DISTRIBUTION, HABITAT UTILIZATION, AND SOCIAL INTERACTIONS OF HUMPBACK WHALES, MEGAPTERA NOVAEANGLIAE, OFF HAWAI'I, DETERMINED USING ACOUSTIC AND VISUAL TECHNIQUES , 1995 .

[14]  Mark A. Girolami,et al.  Bat detective—Deep learning tools for bat acoustic signal detection , 2017, bioRxiv.

[15]  Tuomas Virtanen,et al.  A report on sound event detection with different binaural features , 2017, ArXiv.

[16]  Steven B. Smith,et al.  Digital Signal Processing: A Practical Guide for Engineers and Scientists , 2002 .

[17]  Almo Farina,et al.  Ecoacoustics: the Ecological Investigation and Interpretation of Environmental Sound , 2015, Biosemiotics.

[18]  Björn W. Schuller,et al.  Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Paul Roe,et al.  Revealing the ecological content of long-duration audio-recordings of the environment through clustering and visualisation , 2018, PloS one.

[20]  Shashidhar G. Koolagudi,et al.  Bird classification based on their sound patterns , 2016, Int. J. Speech Technol..

[21]  Björn W. Schuller,et al.  Intelligent Audio Analysis , 2013, Signals and communication technology.

[22]  Lisa Drew,et al.  Are We Losing the Science of Taxonomy? , 2011 .

[23]  Larissa L Bailey,et al.  Experimental investigation of false positive errors in auditory species occurrence surveys. , 2012, Ecological applications : a publication of the Ecological Society of America.

[24]  P C Schön,et al.  Linear prediction coding analysis and self-organizing feature map as tools to classify stress calls of domestic pigs (Sus scrofa). , 2001, The Journal of the Acoustical Society of America.

[25]  Wen-bin Li,et al.  Audio-only Bird Species Automated Identification Method with Limited Training Data Based on Multi-Channel Deep Convolutional Neural Networks , 2018, ArXiv.

[26]  Efstathia Robakis,et al.  Classification of producer characteristics in primate long calls using neural networks. , 2018, The Journal of the Acoustical Society of America.

[27]  Mark D. Plumbley,et al.  Computational Analysis of Sound Scenes and Events , 2017 .

[28]  Ian McLoughlin,et al.  What makes audio event detection harder than classification? , 2016, 2017 25th European Signal Processing Conference (EUSIPCO).

[29]  T. Ura,et al.  Vocalization based Individual Classification of Humpback Whales using Support Vector Machine , 2007, OCEANS 2007.

[30]  Björn W. Schuller,et al.  openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit , 2016, J. Mach. Learn. Res..