论文信息 - Adaptive DCTNet for audio signal classification

Adaptive DCTNet for audio signal classification

In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to human audio perception than features such as Mel-frequency spectral coefficients (MFSC). We use features extracted by the A-DCTNet as input for classifiers. Experimental results show that the A-DCTNet and Recurrent Neural Networks (RNN) achieve state-of-the-art performance in bird song classification rate, and improve artist identification accuracy in music data. They demonstrate A-DCTNet's applicability to signal processing problems.

Liang Lu | Zhe Gan | Yunchen Pu | Yin Xian | Andrew Thompson

[1] H. Sebastian Seung,et al. Natural Image Denoising with Convolutional Networks , 2008, NIPS.

[2] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3] Ingrid Daubechies,et al. The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[4] Judith C. Brown. Calculation of a constant Q spectral transform , 1991 .

[5] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[6] Daniel P. W. Ellis,et al. Classifying Music Audio with Timbral and Chroma Features , 2007, ISMIR.

[7] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .

[8] Andrew Beng Jin Teoh,et al. DCTNet: A simple learning-free approach for face recognition , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[9] Tara N. Sainath,et al. Learning filter banks within a deep neural network framework , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[10] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[11] Stéphane Mallat,et al. Group Invariant Scattering , 2011, ArXiv.

[12] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[13] William J. Williams,et al. Shift covariant time-frequency distributions of discrete signals , 1999, IEEE Trans. Signal Process..

[14] Joakim Andén,et al. Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[15] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[16] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[17] John S. D. Mason,et al. On the limitations of cepstral features in noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.