Understandable models Of music collections based on exhaustive feature generation with temporal statistics

Data mining in large collections of polyphonic music has recently received increasing interest by companies along with the advent of commercial online distribution of music. Important applications include the categorization of songs into genres and the recommendation of songs according to musical similarity and the customer's musical preferences. Modeling genre or timbre of polyphonic music is at the core of these tasks and has been recognized as a difficult problem. Many audio features have been proposed, but they do not provide easily understandable descriptions of music. They do not explain why a genre was chosen or in which way one song is similar to another. We present an approach that combines large scale feature generation with meta learning techniques to obtain meaningful features for musical similarity. We perform exhaustive feature generation based on temporal statistics and train regression models to summarize a subset of these features into a single descriptor of a particular notion of music. Using several such models we produce a concise semantic description of each song. Genre classification models based on these semantic features are shown to be better understandable and almost as accurate as traditional methods.

[1]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[2]  B. Moore,et al.  A revision of Zwicker's loudness model , 1996 .

[3]  Thomas Kamps,et al.  Improving Content-Based Similarity Measures by Training a Collaborative Model , 2005, ISMIR.

[4]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[5]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[6]  Katharina Morik,et al.  A Benchmark Dataset for Audio Classification and Clustering , 2005, ISMIR.

[7]  George Tzanetakis,et al.  HUMAN PERCEPTION AND COMPUTER EXTRACTION OF BEAT STRENGTH , 2002 .

[8]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[9]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[10]  Elias Pampalk A Matlab Toolbox to Compute Music Similarity from Audio , 2004, ISMIR.

[11]  François Pachet,et al.  FINDING SONGS THAT SOUND THE SAME , 2002 .

[12]  H. Kantz,et al.  Nonlinear time series analysis , 1997 .

[13]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[14]  Stephen Cox,et al.  Features and classifiers for the automatic classification of musical audio signals , 2004, ISMIR.

[15]  S. S. Stevens,et al.  Critical Band Width in Loudness Summation , 1957 .

[16]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[17]  C.-C. Jay Kuo,et al.  Content-based classification and retrieval of audio , 1998, Optics & Photonics.

[18]  François Pachet,et al.  Tools and Architecture for the Evaluation of Similarity Measures : Case Study of Timbre Similarity , 2004, ISMIR.

[19]  Mario Nöcker,et al.  Databionic Visualization of Music Collections According to Perceptual Distance , 2005, ISMIR.

[20]  Michael W. Berry,et al.  Survey of Text Mining , 2003, Springer New York.

[21]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Jan Larsen,et al.  Improving music genre classification by short time feature integration , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[24]  G. Widmer,et al.  ON THE EVALUATION OF PERCEPTUAL SIMILARITY MEASURES FOR MUSIC , 2003 .

[25]  F. Mörchen,et al.  MusicMiner : Visualizing timbre distances of music as topographical maps , 2005 .

[26]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[27]  Fabian Mörchen,et al.  Modeling timbre distance with temporal statistics from polyphonic music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Elias Pampalk,et al.  Content-based organization and visualization of music archives , 2002, MULTIMEDIA '02.

[29]  Michael W. Berry,et al.  Survey of Text Mining: Clustering, Classification, and Retrieval , 2007 .

[30]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[31]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[32]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[33]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[34]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[35]  George Tzanetakis,et al.  Automatic Musical Genre Classification of Audio Signals , 2001, ISMIR.

[36]  Masataka Goto,et al.  A chorus-section detecting method for musical audio signals , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[37]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[38]  Qi Tian,et al.  Musical genre classification using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[39]  François Pachet,et al.  Evolving Automatically High-Level Music Descriptors from Acoustic Signals , 2003, CMMR.

[40]  Daniel P. W. Ellis,et al.  Anchor space for classification and similarity measurement of music , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[41]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[42]  George Tzanetakis,et al.  HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH , 2002 .

[43]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[44]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[45]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[46]  Katharina Morik,et al.  Automatic Feature Extraction for Classifying Audio Data , 2005, Machine Learning.

[47]  Xavier Serra,et al.  SIMAC: semantic interaction with music audio contents , 2005 .