Long-distance Detection of Bioacoustic Events with Per-channel Energy Normalization

This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalizes logarithm-based spectral flux, yet with a tunable time scale for background noise estimation. In comparison with pointwise logarithm, PCEN reduces false alarm rate by 50x in the near field and 5x in the far field, both on avian and marine bioacoustic datasets. Such improvements come at moderate computational cost and require no human intervention, thus heralding a promising future for PCEN in bioacoustics.

[1]  Pablo Cancela,et al.  End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments , 2019, 2019 24th Conference of Open Innovations Association (FRUCT).

[2]  Karol J. Piczak ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.

[3]  Erin M. Bayne,et al.  Pre-processing spectrogram parameters improve the accuracy of bioacoustic classification using convolutional neural networks , 2020, Bioacoustics.

[4]  Richard P. Hodges Underwater Acoustics: Analysis, Design and Performance of Sonar , 2010 .

[5]  Gilles A. Daigle,et al.  Atmospheric Sound Propagation , 2007 .

[6]  Erin M. Bayne,et al.  Classification threshold and training data affect the quality and utility of focal species data processed with automated audio-recognition software , 2018, Bioacoustics.

[7]  Thierry Aubin,et al.  Screening large audio datasets to determine the time and space distribution of Screaming Piha birds in a tropical forest , 2016, Ecol. Informatics.

[8]  Antonio Cardenal-Lopez,et al.  ShipsEar: An underwater vessel noise database , 2016 .

[9]  Steve Kelling,et al.  Fusing shallow and deep learning for bioacoustic bird species classification , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Vincent Lostanlen,et al.  Birdvox-Full-Night: A Dataset and Benchmark for Avian Flight Call Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Richard F. Lyon,et al.  Trainable frontend for robust and far-field keyword spotting , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Anssi Klapuri,et al.  Sound onset detection by applying psychoacoustic knowledge , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[13]  Justin Salamon,et al.  Tricycle: Audio Representation Learning from Sensor Network Data Using Self-Supervision , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[14]  G. Widmer,et al.  MAXIMUM FILTER VIBRATO SUPPRESSION FOR ONSET DETECTION , 2013 .

[15]  Vincent Lostanlen,et al.  Per-Channel Energy Normalization: Why and How , 2019, IEEE Signal Processing Letters.

[16]  Vincent Lostanlen,et al.  Robust sound event detection in bioacoustic sensor networks , 2019, PloS one.

[17]  Michael W. Towsey,et al.  Visualization of Long-duration Acoustic Recordings of the Environment , 2014, ICCS.

[18]  Daniel P. W. Ellis,et al.  librosa/librosa: 0.6.0 , 2018 .

[19]  Murray G Efford,et al.  Population density estimated from locations of individuals on a passive detector array. , 2009, Ecology.

[20]  D.,et al.  Acoustic data from the spring 2011 bowhead whale census at Point Barrow, Alaska , 2023, J. Cetacean Res. Manage..

[21]  Erin M. Bayne,et al.  Autonomous recording units in avian ecological research: current use and future applications , 2017 .