Fast Chirplet Transform feeding CNN, application to orca and bird bioacoustics

Advanced soundscape analysis or machine listening are requiring efficient time frequency decompositions. The recent scattering theory is offering a robust hierarchical convolutional decomposition, nevertheless its kernels need to be fixed. The CNN can be seen as the optimal kernel decomposition, nevertheless it requires large amount of training data. This paper aims to show that Chirplet kernels are providing good constant Q time-frequency representation which yields to a better CNN classification than usual log-Fourier representation. We first recall the main advantages of the Chirplet concerning bioinspired auditory processing. Then the contributions of this paper are (1) to give a new fast implementation of the Chirplet by decreasing its complexity. (2) We validate fast Chirplet computation on nearly real-time over long series of orca online monitoring recordings, and on bird songs on hundreds of birds species. (3) We demonstrate that the Chirplet is improving convolutional neural net classification on complex overlapping bird calls challenge compared to usual Mel representation. Validations are conducted on a subset of the Amazon bird species of the LifeClef 2016 classification challenge.

[1]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[2]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[5]  Simon Haykin,et al.  The chirplet transform: physical considerations , 1995, IEEE Trans. Signal Process..

[6]  S. Haykin,et al.  The Chirplet Transform : A Generalization of Gabor ’ s Logon Transform , 1991 .

[7]  Simon Haykin,et al.  Adaptive chirplet transform: an adaptive generalization of the wavelet transform , 1992 .

[8]  S. Shamma,et al.  Analysis of dynamic spectra in ferret primary auditory cortex. II. Prediction of unit responses to arbitrary dynamic spectra. , 1996, Journal of neurophysiology.

[9]  Mark A. Gluck,et al.  Modeling auditory cortical processing as an adaptive chirplet transform , 2000, Neurocomputing.

[10]  S. Mallat A wavelet tour of signal processing , 1998 .

[11]  Philipp Schaer,et al.  Experimental IR Meets Multilinguality, Multimodality, and Interaction , 2017, Lecture Notes in Computer Science.

[12]  H. Adeli,et al.  Dynamic Fuzzy Wavelet Neural Network Model for Structural System Identification , 2006 .

[14]  Hervé Glotin,et al.  LifeCLEF 2014: Multimedia Life Species Identification Challenges , 2014, CLEF.

[15]  Joakim Andén,et al.  Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.