Segmentation, Classification, and Visualization of Orca Calls Using Deep Learning

Audiovisual media are increasingly used to study the communication and behavior of animal groups, e.g. by placing microphones in the animals habitat resulting in huge datasets with only a small amount of animal interactions. The Orcalab has recorded orca whales since 1973 using stationary underwater hydrophones and made it publicly available on the Orchive. There exist over 15 000 manually extracted orca/noise annotations and about 20 000 h unseen audio data. To analyze the behavior and communication of killer whales we need to interpret the different call types. In this work, we present a two-stage classification approach using the labeled call/noise files and a few labeled call-type files. Results indicate a reliable accuracy of 95.0 % for call segmentation and 87 % for classification of 12 call classes. We further visualize the learned orca call representations in the convolutional neural network (CNN) activations to explain the potential of CNN based recognition for bioaccousitc signals.

[1]  Hervé Glotin,et al.  Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge , 2018, Methods in Ecology and Evolution.

[2]  Steven Ness,et al.  The Orchive: A system for semi-automatic annotation and analysis of a large collection of bioacoustic recordings , 2013 .

[3]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4]  Thomas Grill,et al.  Two convolutional neural networks for bird detection in audio signals , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[5]  Dependence of killer whale (Orcinus orca) acoustic signals on the type of activity and social context , 2013, Biology Bulletin.

[6]  J. Ford,et al.  Acoustic behaviour of resident killer whales (Orcinus orca) off Vancouver Island, British Columbia , 1989 .

[7]  Dan Stowell,et al.  An Open Dataset for Research on Audio Field Recording Archives: freefield1010 , 2013, Semantic Audio.

[8]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[9]  Paul Roe,et al.  Deep Learning Techniques for Koala Activity Detection , 2018, INTERSPEECH.

[10]  Navdeep Jaitly,et al.  Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Paris Smaragdis,et al.  Hidden Markov and Gaussian mixture models for automatic call classification. , 2009, The Journal of the Acoustical Society of America.

[13]  John K. B. Ford,et al.  Call traditions and dialects of killer whales (Orcinus orca) in British Columbia , 1984 .

[14]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Paris Smaragdis,et al.  Automatic identification of individual killer whales. , 2010, The Journal of the Acoustical Society of America.