Spline Filters For End-to-End Deep Learning

We propose to tackle the problem of end-to-end learning for raw waveform signals by introducing learnable continuous time-frequency atoms. The derivation of these filters is achieved by defining a functional space with a given smoothness order and boundary conditions. From this space, we derive the parametric analytical filters. Their differentiability property allows gradient-based optimization. As such, one can utilize any Deep Neural Network (DNN) with these filters. This enables us to tackle in a front-end fashion a large scale bird detection task based on the freefield1010 dataset known to contain key challenges , such as the dimensionality of the inputs data (> 100, 000) and the presence of additional noises: multiple sources and soundscapes.

[1]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[2]  Haryati Jaafar,et al.  Peak Finding Algorithm to Improve Syllable Segmentation for Noisy Bioacoustic Sound Signal , 2016, KES.

[3]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[4]  Iasonas Kokkinos,et al.  Learning Filterbanks from Raw Speech for Phone Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[7]  Michael Unser,et al.  Ten good reasons for using spline wavelets , 1997, Optics & Photonics.

[8]  Wei Dai,et al.  Very deep convolutional neural networks for raw waveforms , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Hervé Glotin,et al.  Fast Chirplet Transform Injects Priors in Deep Learning of Animal Calls and Speech , 2017, ICLR.

[10]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  W. Weston Meyer,et al.  Optimal error bounds for cubic spline interpolation , 1976 .

[12]  Dan Stowell,et al.  An Open Dataset for Research on Audio Field Recording Archives: freefield1010 , 2013, Semantic Audio.

[13]  Christopher A Shera,et al.  Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Thomas Grill,et al.  Two convolutional neural networks for bird detection in audio signals , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[16]  Behnaam Aazhang,et al.  Best basis selection using sparsity driven multi-family wavelet transform , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[17]  S. Mallat A wavelet tour of signal processing , 1998 .

[18]  Hervé Glotin,et al.  Semi-Supervised Learning Enabled by Multiscale Deep Neural Network Inversion , 2018, ArXiv.

[19]  Vincent Lostanlen Opérateurs convolutionnels dans le plan temps-fréquence , 2017 .

[20]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[21]  Yves Meyer,et al.  Wavelets - tools for science and technology , 1987 .

[22]  A.I. Megahed,et al.  Selection of a suitable mother wavelet for analyzing power system fault transients , 2008, 2008 IEEE Power and Energy Society General Meeting - Conversion and Delivery of Electrical Energy in the 21st Century.

[23]  Hervé Glotin,et al.  Enhanced feature extraction using the Morlet transform on 1 MHz recordings reveals the complex nature of Amazon River dolphin (Inia geoffrensis) clicks , 2015 .

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Ray W. Clough Original Formulation of the Finite Element Method , 1989 .

[26]  Wei Liu,et al.  Nonstationary Vibration Signal Analysis Using Wavelet-Based Time–Frequency Filter and Wigner–Ville Distribution , 2016 .

[27]  M. Victor Wickerhauser,et al.  Wavelets: Algorithms and Applications (Yves Meyer) , 1994, SIAM Rev..

[28]  Richard G. Baraniuk,et al.  Overcomplete Frame Thresholding for Acoustic Scene Analysis , 2017, ArXiv.

[29]  I. J. Schoenberg On Interpolation by Spline Functions and its Minimal Properties , 1988 .

[30]  Tara N. Sainath,et al.  Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.

[31]  Tuomas Virtanen,et al.  Filterbank learning for deep neural network based polyphonic sound event detection , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[32]  Joakim Andén,et al.  Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[33]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[34]  Gaël Richard,et al.  Acoustic Features for Environmental Sound Analysis , 2018 .