论文信息 - Randomly Weighted CNNs for (Music) Audio Classification

Randomly Weighted CNNs for (Music) Audio Classification

The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors. Following this idea, we study how non-trained (randomly weighted) convolutional neural networks perform as feature extractors for (music) audio classification tasks. We use features extracted from the embeddings of deep architectures as input to a classifier – with the goal to compare classification accuracies when using different randomly weighted architectures. By following this methodology, we run a comprehensive evaluation of the current architectures for audio classification, and provide evidence that the architectures alone are an important piece for resolving (music) audio problems using deep neural networks.

Xavier Serra | Jordi Pons | Jordi Pons | Xavier Serra

[1] Xavier Serra,et al. ISMIR 2004 Audio Description Contest , 2006 .

[2] Sabu Emmanuel,et al. ELM for the Classification of Music Genres , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4] Xavier Serra,et al. Timbre analysis of music audio signals with convolutional neural networks , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[5] Zhenghao Chen,et al. On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[6] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[7] Dong Yu,et al. Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.

[8] Xavier Serra,et al. A Wavenet for Speech Denoising , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Xavier Serra,et al. End-to-end Learning for Music Audio Tagging at Scale , 2017, ISMIR.

[10] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[11] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[13] Herbert Jaeger,et al. The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[14] Juhan Nam,et al. Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms , 2017, ArXiv.

[15] Georg Holzmann,et al. RESERVOIR COMPUTING: A POWERFUL BLACK-BOX FRAMEWORK FOR NONLINEAR AUDIO PROCESSING , 2009 .

[16] Chee Kheong Siew,et al. Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[17] Keunwoo Choi,et al. DLR : Toward a deep learned rhythmic representation for music content analysis , 2017, ArXiv.

[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Huy Phan,et al. Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks , 2016, INTERSPEECH.

[20] Judith C. Brown. Calculation of a constant Q spectral transform , 1991 .

[21] Mark B. Sandler,et al. Automatic Tagging Using Deep Convolutional Neural Networks , 2016, ISMIR.

[22] Benjamin Schrauwen,et al. End-to-end learning for music audio , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23] Danilo Comminiello,et al. Music classification using extreme learning machines , 2013, 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA).

[24] Alan Hanjalic,et al. One deep music representation to rule them all? A comparative analysis of different representation learning strategies , 2018, Neural Computing and Applications.

[25] Jesse Engel,et al. Learning Multiscale Features Directly from Waveforms , 2016, INTERSPEECH.

[26] Xavier Serra,et al. Designing efficient architectures for modeling temporal features with convolutional neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[28] Simone Scardapane,et al. Semi-supervised Echo State Networks for Audio Classification , 2017, Cognitive Computation.

[29] Bob L. Sturm,et al. Deep Learning and Music Adversaries , 2015, IEEE Transactions on Multimedia.

[30] Zhihong Man,et al. Automatic Han Chinese Folk Song Classification Using Extreme Learning Machines , 2012, Australasian Conference on Artificial Intelligence.

[31] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Juhan Nam,et al. SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification , 2018 .

[33] Dejan J. Sobajic,et al. Learning and generalization characteristics of the random vector Functional-link net , 1994, Neurocomputing.

[34] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[35] Mark Sandler,et al. Transfer Learning for Music Classification and Regression Tasks , 2017, ISMIR.

[36] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37] Xavier Serra,et al. Multi-Label Music Genre Classification from Audio, Text and Images Using Deep Features , 2017, ISMIR.

[38] Been Kim,et al. Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values , 2018, ICLR.

[39] Xavier Serra,et al. Score-Informed Syllable Segmentation for A Cappella Singing Voice with Convolutional Neural Networks , 2017, ISMIR.

[40] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41] Ning Chen,et al. High-Level Music Descriptor Extraction Algorithm Based on Combination of Multi-Channel CNNs and LSTM , 2017, ISMIR.

[42] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[43] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[44] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[45] Sebastian Böck,et al. Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46] Geoffroy Peeters,et al. Scale and shift invariant time/frequency representation using auditory statistics: Application to rhythm description , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[47] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[48] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Robert P. W. Duin,et al. Feedforward neural networks with random weights , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[50] Xavier Serra,et al. Experimenting with musically motivated convolutional neural networks , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[51] John K. Tsotsos,et al. Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing , 2018, 2019 16th Conference on Computer and Robot Vision (CRV).

[52] Andrea Vedaldi,et al. Deep Image Prior , 2017, International Journal of Computer Vision.

[53] Xavier Serra,et al. Audio to Score Matching by Combining Phonetic and Duration Information , 2017, ISMIR.