Convolution-based classification of audio and symbolic representations of music

Abstract We present a novel convolution-based method for classification of audio and symbolic representations of music, which we apply to classification of music by style. Pieces of music are first sampled to pitch–time representations (spectrograms or piano-rolls) and then convolved with a Gaussian filter, before being classified by a support vector machine or by k-nearest neighbours in an ensemble of classifiers. On the well-studied task of discriminating between string quartet movements by Haydn and Mozart, we obtain accuracies that equal the state of the art on two data-sets. However, in multi-class composer identification, methods specialised for classifying symbolic representations of music are more effective. We also performed experiments on symbolic representations, synthetic audio and two different recordings of The Well-Tempered Clavier by J. S. Bach to study the method’s capacity to distinguish preludes from fugues. Our experimental results show that our approach performs similarly on symbolic representations, synthetic audio and audio recordings, setting our method apart from most previous studies that have been designed for use with either audio or symbolic data, but not both.

[1]  Cliff Eisen,et al.  The string quartet , 2009 .

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Tillman Weyde,et al.  An approach to melodic segmentation and classification based on filtering with the Haar-wavelet , 2013 .

[4]  Bernard Manderick,et al.  String Quartet Classification with Monophonic Models , 2010, ISMIR.

[5]  E. Backer,et al.  Musical style recognition - a quantitative approach , 2004 .

[6]  Andreas Nürnberger,et al.  Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation , 2012, Lecture Notes in Computer Science.

[7]  Karl H. Pribram,et al.  Convolution and matrix systems as content addressible distributed brain processes in perception and memory , 1986, Journal of Neurolinguistics.

[8]  Johann Sebastian Bach,et al.  The well-tempered clavier I , 1999 .

[9]  Alain Rakotomamonjy,et al.  Automatic Feature Learning for Spatio-Spectral Image Classification With Sparse SVM , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[10]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[11]  Scott Barry Kaufman,et al.  The philosophy of creativity : new essays , 2014 .

[12]  Tillman Weyde,et al.  Composer Recognition Based on 2D-Filtered Piano-Rolls , 2016, ISMIR.

[13]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[14]  Bob L. Sturm A Survey of Evaluation in Music Genre Recognition , 2012, Adaptive Multimedia Retrieval.

[15]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  David Meredith,et al.  The ps13 pitch spelling algorithm , 2006 .

[17]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[18]  Tao Li,et al.  N-Gram Chord Profiles for Composer Style Representation , 2008, ISMIR.

[19]  Xavier Serra,et al.  Experimenting with musically motivated convolutional neural networks , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[20]  Mark Sandler,et al.  Transfer Learning for Music Classification and Regression Tasks , 2017, ISMIR.

[21]  Petri Toiviainen,et al.  MIDI toolbox : MATLAB tools for music research , 2004 .

[22]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[23]  William Herlands,et al.  A Machine Learning Approach to Musically Meaningful Homogeneous Style Classification , 2014, AAAI.

[24]  Anssi Klapuri,et al.  A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution , 2014, Semantic Audio.

[25]  Luiz Eduardo Soares de Oliveira,et al.  Music genre classification using LBP textural features , 2012, Signal Process..

[26]  R. Keys Cubic convolution interpolation for digital image processing , 1981 .

[27]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[28]  Ingrid Daubechies,et al.  A Nonlinear Squeezing of the Continuous Wavelet Transform Based on Auditory Nerve Models , 2017 .

[29]  F. Wiering,et al.  A Comparison between Global and Local Features for Computational Classification of Folk Song Melodies , 2013 .

[30]  Subhabrata Chakraborti,et al.  Nonparametric Statistical Inference , 2011, International Encyclopedia of Statistical Science.

[31]  Zehra Cataltepe,et al.  Music Genre Classification Using MIDI and Audio Features , 2007, EURASIP J. Adv. Signal Process..

[32]  Jean C. Rush,et al.  The Perception of Artistic Style , 1981 .

[33]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[34]  Jyh-Shing Roger Jang,et al.  Combining Visual and Acoustic Features for Music Genre Classification , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[35]  José Manuel Iñesta Quereda,et al.  Modeling Musical Style with Language Models for Composer Recognition , 2013, IbPRIA.

[36]  Abhijit Karmakar,et al.  Synthesis of an Optimal Wavelet Based on Auditory Perception Criterion , 2011, EURASIP J. Adv. Signal Process..

[37]  Andreas Rauber,et al.  Improving Genre Classification by Combination of Audio and Symbolic Descriptors Using a Transcription Systems , 2007, ISMIR.

[38]  Xavier Serra,et al.  Multi-Label Music Genre Classification from Audio, Text and Images Using Deep Features , 2017, ISMIR.

[39]  Peter Thompson,et al.  Basic Vision: An Introduction to Visual Perception , 2006 .

[40]  Miodrag Lovric,et al.  International Encyclopedia of Statistical Science , 2011 .

[41]  T. Lidy Parallel Convolutional Neural Networks for Music Genre and Mood Classification , 2016 .

[42]  Bob L. Sturm A Simple Method to Determine if a Music Information Retrieval System is a “Horse” , 2014, IEEE Transactions on Multimedia.

[43]  Jan Schlüter,et al.  Learning to Pinpoint Singing Voice from Weakly Labeled Examples , 2016, ISMIR.

[44]  George Tzanetakis,et al.  Pitch Histograms in Audio and Symbolic Music Information Retrieval , 2003, ISMIR.

[45]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.