论文信息 - Spline Filters For End-to-End Deep Learning

Spline Filters For End-to-End Deep Learning

We propose to tackle the problem of end-to-end learning for raw waveform signals by introducing learnable continuous time-frequency atoms. The derivation of these filters is achieved by defining a functional space with a given smoothness order and boundary conditions. From this space, we derive the parametric analytical filters. Their differentiability property allows gradient-based optimization. As such, one can utilize any Deep Neural Network (DNN) with these filters. This enables us to tackle in a front-end fashion a large scale bird detection task based on the freefield1010 dataset known to contain key challenges , such as the dimensionality of the inputs data (> 100, 000) and the presence of additional noises: multiple sources and soundscapes.

[1] Brendan J. Frey,et al. Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[2] Haryati Jaafar,et al. Peak Finding Algorithm to Improve Syllable Segmentation for Noisy Bioacoustic Sound Signal , 2016, KES.

[3] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[4] Iasonas Kokkinos,et al. Learning Filterbanks from Raw Speech for Phone Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Max Welling,et al. Group Equivariant Convolutional Networks , 2016, ICML.

[7] Michael Unser,et al. Ten good reasons for using spline wavelets , 1997, Optics & Photonics.

[8] Wei Dai,et al. Very deep convolutional neural networks for raw waveforms , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Hervé Glotin,et al. Fast Chirplet Transform Injects Priors in Deep Learning of Animal Calls and Speech , 2017, ICLR.

[10] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] W. Weston Meyer,et al. Optimal error bounds for cubic spline interpolation , 1976 .

[12] Dan Stowell,et al. An Open Dataset for Research on Audio Field Recording Archives: freefield1010 , 2013, Semantic Audio.

[13] Christopher A Shera,et al. Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15] Thomas Grill,et al. Two convolutional neural networks for bird detection in audio signals , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[16] Behnaam Aazhang,et al. Best basis selection using sparsity driven multi-family wavelet transform , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[17] S. Mallat. A wavelet tour of signal processing , 1998 .

[18] Hervé Glotin,et al. Semi-Supervised Learning Enabled by Multiscale Deep Neural Network Inversion , 2018, ArXiv.

[19] Vincent Lostanlen. Opérateurs convolutionnels dans le plan temps-fréquence , 2017 .

[20] F. Harris. On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[21] Yves Meyer,et al. Wavelets - tools for science and technology , 1987 .

[22] A.I. Megahed,et al. Selection of a suitable mother wavelet for analyzing power system fault transients , 2008, 2008 IEEE Power and Energy Society General Meeting - Conversion and Delivery of Electrical Energy in the 21st Century.

[23] Hervé Glotin,et al. Enhanced feature extraction using the Morlet transform on 1 MHz recordings reveals the complex nature of Amazon River dolphin (Inia geoffrensis) clicks , 2015 .

[24] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[25] Ray W. Clough. Original Formulation of the Finite Element Method , 1989 .

[26] Wei Liu,et al. Nonstationary Vibration Signal Analysis Using Wavelet-Based Time–Frequency Filter and Wigner–Ville Distribution , 2016 .

[27] M. Victor Wickerhauser,et al. Wavelets: Algorithms and Applications (Yves Meyer) , 1994, SIAM Rev..

[28] Richard G. Baraniuk,et al. Overcomplete Frame Thresholding for Acoustic Scene Analysis , 2017, ArXiv.

[29] I. J. Schoenberg. On Interpolation by Spline Functions and its Minimal Properties , 1988 .

[30] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.

[31] Tuomas Virtanen,et al. Filterbank learning for deep neural network based polyphonic sound event detection , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[32] Joakim Andén,et al. Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[33] Judith C. Brown. Calculation of a constant Q spectral transform , 1991 .

[34] Gaël Richard,et al. Acoustic Features for Environmental Sound Analysis , 2018 .