Deep Learning and Domain Transfer for Orca Vocalization Detection

In this paper, we study the difficulties of domain transfer when training deep learning models, on a specific task that is orca vocalization detection. Deep learning appears to be an answer to many sound recognition tasks in human speech analysis as well as in bioacoustics. This method allows to learn from large amounts of data, and find the best scoring way to discriminate between classes (e.g. orca vocalization and other sounds). However, to learn the perfect data representation and discrimination boundaries, all possible data configurations need to be processed. This causes problems when those configurations are ever changing (e.g. in our experiment, a change in the recording system happened to considerably disturb our previously well performing model). We thus explore approaches to compensate on the difficulties faced with domain transfer, with two convolutionnal neural networks (CNN) architectures, one that works in the time-frequency domain, and one that works directly on the time domain.

[1]  R. W. Baird,et al.  A review of Killer Whale interactions with other marine mammals: predation to co‐existence , 1991 .

[2]  Michael J. Ford,et al.  Estimation of a Killer Whale (Orcinus orca) Population’s Diet Using Sequencing Analysis of DNA from Feces , 2016, PloS one.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Jonathan J. Huang,et al.  AclNet: efficient end-to-end audio classification CNN , 2018, ArXiv.

[7]  Richard F. Lyon,et al.  Trainable frontend for robust and far-field keyword spotting , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Steven Ness,et al.  The Orchive: A system for semi-automatic annotation and analysis of a large collection of bioacoustic recordings , 2013 .

[9]  Hervé Glotin,et al.  Deep Learning for Ethoacoustics of Oreas on three years pentaphonie continuous recording at Orealab revealing tide, moon and diel effects , 2019, OCEANS 2019 - Marseille.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  J. Ford,et al.  Acoustic behaviour of resident killer whales (Orcinus orca) off Vancouver Island, British Columbia , 1989 .

[12]  Vincent Lostanlen,et al.  Robust sound event detection in bioacoustic sensor networks , 2019, PloS one.

[13]  Tomohiro Nakatani,et al.  Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models , 2016, INTERSPEECH.

[14]  Elmar Nöth,et al.  ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning , 2019, Scientific Reports.

[15]  Thomas Grill,et al.  Two convolutional neural networks for bird detection in audio signals , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[16]  Hervé Glotin,et al.  Low-Power Wake-Up System based on Frequency Analysis for Environmental Internet of Things , 2018, 2018 14th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications (MESA).

[17]  Gopinath Chennupati,et al.  On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks , 2019, NeurIPS.

[18]  Wei Liu,et al.  Whistle detection and classification for whales based on convolutional neural networks , 2019, Applied Acoustics.

[19]  Mario Lasseck Large-scale Identification of Birds in Audio Recordings , 2014, CLEF.

[20]  Mehryar Mohri,et al.  L2 Regularization for Learning Kernels , 2009, UAI.

[21]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.