Wavelet Learning by Adaptive Hermite Cubic Splines applied to Bioacoustic Chirps

Acoustic monitoring is used to study marine mammals in oceans. Automated analysis for captured sound is almost essential because of the large quantity of data. The deep learning approach is an efficient method, however acoustic features are often not adapted. Convolutional Neural Net can be seen as an optimal kernel decomposition, nevertheless it requires large amount of training data to learn its kernels. An alternative using pre-imposed kernels and thus not requiring any amount of data is the scattering framework which imposes as kernels wavelet filters. Our research focuses on adaptive time-frequency decomposition of bioacoustic signal, based on cubic spline learning representation. We give the theoretical derivations of the model, and demonstrates efficient real applications of various signal, including chirps of songs of Blue Whale.

[1]  Yves Meyer,et al.  Wavelets - tools for science and technology , 1987 .

[2]  A.I. Megahed,et al.  Selection of a suitable mother wavelet for analyzing power system fault transients , 2008, 2008 IEEE Power and Energy Society General Meeting - Conversion and Delivery of Electrical Energy in the 21st Century.

[3]  Behnaam Aazhang,et al.  Best basis selection using sparsity driven multi-family wavelet transform , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[4]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[5]  Haryati Jaafar,et al.  Peak Finding Algorithm to Improve Syllable Segmentation for Noisy Bioacoustic Sound Signal , 2016, KES.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[8]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing, 2nd Edition , 1999 .

[9]  Hervé Glotin,et al.  Spline Filters For End-to-End Deep Learning , 2018, ICML.

[10]  Hervé Glotin,et al.  Enhanced feature extraction using the Morlet transform on 1 MHz recordings reveals the complex nature of Amazon River dolphin (Inia geoffrensis) clicks , 2015 .

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Gaël Richard,et al.  Acoustic Features for Environmental Sound Analysis , 2018 .

[14]  Wei Dai,et al.  Very deep convolutional neural networks for raw waveforms , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[16]  Tuomas Virtanen,et al.  Filterbank learning for deep neural network based polyphonic sound event detection , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Iasonas Kokkinos,et al.  Learning Filterbanks from Raw Speech for Phone Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Joakim Andén,et al.  Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[20]  Hervé Glotin,et al.  Deep Learning for Ethoacoustics of Oreas on three years pentaphonie continuous recording at Orealab revealing tide, moon and diel effects , 2019, OCEANS 2019 - Marseille.

[21]  E. Mercado The Sonar Model for Humpback Whale Song Revised. , 2018, Frontiers in psychology.

[22]  Tara N. Sainath,et al.  Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.

[23]  D. Komatitsch,et al.  Mono-hydrophone localization of baleen whales: a study of propagation using a spectral element method applied in Northern Chile , 2019, OCEANS 2019 - Marseille.

[24]  Hervé Glotin,et al.  Fast Chirplet Transform Injects Priors in Deep Learning of Animal Calls and Speech , 2017, ICLR.

[25]  Vincent Lostanlen Convolutional operators in the time-frequency domain. (Opérateurs convolutionnels dans le plan temps-fréquence) , 2017 .

[26]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Dan Stowell,et al.  An Open Dataset for Research on Audio Field Recording Archives: freefield1010 , 2013, Semantic Audio.