DOCC10: Open access dataset of marine mammal transient studies and end-to-end CNN classification

Classification of transients is a difficult task. In bioacoustics, almost all studies are still done with human labeling. In passive acoustic monitoring (PAM), the data to label are made up from months of continuous recordings with multiple recording stations and the time required to label everything with human labeling is longer than the next recording session will take to produce new data, even with multiple experts. To help lay a foundation for the emergence of automatic labeling of marine mammal transients, we built a dataset using weak labels from a 3TB dataset of marine mammal transients of DCLDE 2018. The DCLDE dataset was made for a click classification challenge. The new dataset has strong labels and opened a new challenge, DOCC10, whose baseline is also described in this paper. The accuracy of 71% of the baseline is already good enough to curate the large dataset, leaving only some regions of interest still to be expertised. But this is far from perfect, and there remains space for future improvement, or challenging alternative techniques. A smaller version of DOCC10 named DOCC7 is also presented.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Daniel P. W. Ellis,et al.  Learning Sound Event Classifiers from Web Audio with Noisy Labels , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Stéphane Mallat,et al.  Multiscale Hierarchical Convolutional Networks , 2017, ArXiv.

[4]  Revue Forestière Française Le parc national de Port-Cros , 1971 .

[5]  R Payne,et al.  Sperm whale sound production studied with ultrasound time/depth-recording tags. , 2002, The Journal of experimental biology.

[6]  Junmo Kim,et al.  NLNL: Negative Learning for Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Onur Avci,et al.  1-D Convolutional Neural Networks for Signal Processing Applications , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Hervé Glotin,et al.  Efficient artifacts filter by density-based clustering in long term 3D whale passive acoustic monitoring with five hydrophones fixed under an Autonomous Surface Vehicle , 2019, OCEANS 2019 - Marseille.

[9]  Jan Schlüter,et al.  Large-scale unsupervised clustering of Orca vocalizations: a model for describing orca communication systems , 2019 .

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Gopinath Chennupati,et al.  On Mixup Training: Improved Calibration and Predictive Uncertainty for Deep Neural Networks , 2019, NeurIPS.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[14]  Thomas Grill,et al.  Two convolutional neural networks for bird detection in audio signals , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[15]  Yannis Stylianou,et al.  Detection of sperm whale clicks based on the Teager–Kaiser energy operator , 2006 .

[16]  Hervé Glotin,et al.  Whale cocktail party: Real-time multiple tracking and signal analyses , 2008 .

[17]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[18]  Kate E. Jones,et al.  CityNet—Deep learning tools for urban ecoacoustic assessment , 2018, Methods in Ecology and Evolution.

[19]  Toshihisa Tanaka,et al.  Fully Data-driven Convolutional Filters with Deep Learning Models for Epileptic Spike Detection , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  F. Pukelsheim The Three Sigma Rule , 1994 .

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Jonathan J. Huang,et al.  AclNet: efficient end-to-end audio classification CNN , 2018, ArXiv.

[23]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.