Music instrument recognition using deep convolutional neural networks

AbstractMusical instruments identification in polyphonic is a challenge in music information retrieval. In proposed work, a deep convolution neural network framework for predominant instrument recognition in real-world polyphonic music is accomplished. The network is trained on fixed-length music with a labeled predominant instrument and estimate an arbitrary number of instruments from an audio signal with variable length. The Mel spectrogram representation is used to map audio data into the matrix format. This work used eight layer convolution neural network for instrument recognition. ReLu activation function is used for the scaling of training data and introduces non-linearity in the network. At each layer, Max Pooling function is used for the dimension reduction. For the regularization, dropout is used which prevent the output from getting overfitting. The Softmax function gives the probability of particular instruments. The research excellent result with 92.8% accuracy.

[1]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[2]  Sergios Theodoridis,et al.  Classification of musical patterns using variable duration hidden Markov models , 2004, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Nicolás Ruiz-Reyes,et al.  Music Scene-Adaptive Harmonic Dictionary for Unsupervised Note-Event Detection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Jeffrey L. Elman,et al.  Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations , 1997 .

[5]  T. Nitta,et al.  A back-propagation algorithm for complex numbered neural networks , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[6]  Feng Rong,et al.  Audio Classification Method Based on Machine Learning , 2016, 2016 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS).

[7]  Xiaomin Wu,et al.  Noisy image magnification with total variation regularization and order-changed dictionary learning , 2015, EURASIP Journal on Advances in Signal Processing.

[8]  Jae-Hun Kim,et al.  Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[12]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[13]  Ron J. Weiss,et al.  Speech acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[15]  Mahmoud Al-Ayyoub,et al.  Using Logistic Regression to Improve Virtual Machines Management in Cloud Computing Systems , 2017, 2017 IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS).

[16]  Kim Plunkett,et al.  Exercises in rethinking innateness , 1997 .

[17]  P.P. de Leon,et al.  Pattern Recognition Approach for Music Style Identification Using Shallow Statistical Descriptors , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  The Duy Bui,et al.  Speech classification using SIFT features on spectrogram images , 2016, Vietnam Journal of Computer Science.

[19]  Duc Thanh Anh Luong,et al.  A K-Means Approach to Clustering Disease Progressions , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[20]  Jordi Janer,et al.  A Comparison of Sound Segregation Techniques for Predominant Instrument Recognition in Musical Audio Signals , 2012, ISMIR.

[21]  Gaël Richard,et al.  Instrument recognition in polyphonic music based on automatic taxonomies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Haizhou Li,et al.  Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation , 2016, EURASIP J. Adv. Signal Process..

[23]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.