Preference Music Ratings Prediction Using Tokenization and Minimum Classification Error Training

In order to address two main limitations of current content-based music recommendation approaches, an ordinal regression algorithm for music recommendation that incorporates dynamic information is presented. Instead of assuming that local spectral features within a song are identically and independently distributed examples of an underlying probability density, music is characterized by a vocabulary of acoustic segment models (ASMs), which are found with an unsupervised process. Further, instead of classifying music based on subjective classes, such as genre, or trying to find a universal notion of similarity, songs are classified based on personal preference ratings. The ordinal regression approach to perform the ratings prediction is based on the discriminative-training algorithm known as minimum classification error (MCE) training. Experimental results indicate that improved temporal modeling leads to superior performance over standard spectral-based music representations. Further, the MCE-based preference ratings algorithm is shown to be superior over two other systems. Analysis demonstrates that the superior performance is due to MCE being a non-conservative algorithm that demonstrates immunity to outliers.

[1]  Jyh-Shing Roger Jang,et al.  On the use of sequential patterns mining as temporal features for music genre classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[4]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[5]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  Ichiro Fujinaga,et al.  Musical genre classification: Is it worth pursuing and how can it be improved? , 2006, ISMIR.

[8]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[9]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[10]  James Bennett,et al.  The Netflix Prize , 2007 .

[11]  Juan Pablo Bello Grouping Recorded Music by Structural Similarity , 2009, ISMIR.

[12]  Masataka Goto,et al.  Instrument Equalizer for Query-by-Example Retrieval: Improving Sound Source Separation Based on Integrated Harmonic and Inharmonic Models , 2008, ISMIR.

[13]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[14]  Shigeru Katagiri,et al.  A derivation of minimum classification error from the theoretical classification risk using Parzen estimation , 2004, Comput. Speech Lang..

[15]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Mathieu Lagrange,et al.  Multimodal similarity between musical streams for cover version detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[18]  François Pachet,et al.  Representing Musical Genre: A State of the Art , 2003 .

[19]  Torbjørn Svendsen,et al.  On the automatic segmentation of speech signals , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[21]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[22]  D. Ellis Learning the meaning of music , 2005 .

[23]  Chin-Hui Lee,et al.  A Study on Music Genre Classification Based on Universal Acoustic Models , 2006, ISMIR.

[24]  François Pachet,et al.  A taxonomy of musical genres , 2000, RIAO.

[25]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[26]  Elias Pampalk,et al.  Introduction–From Genres to Tags: A Little Epistemology of Music Information Retrieval Research , 2008 .

[27]  Tao Li,et al.  Content-based music similarity search and emotion detection , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Chin-Hui Lee,et al.  On the importance of modeling temporal information in music tag annotation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Mehryar Mohri,et al.  Robust Music Identification, Detection, and Analysis , 2007, ISMIR.

[30]  Koby Crammer,et al.  Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[31]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[32]  François Pachet,et al.  The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[33]  Daniel P. W. Ellis,et al.  Chord segmentation and recognition using EM-trained hidden markov models , 2003, ISMIR.

[34]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[35]  François Pachet,et al.  The influence of polyphony on the dynamical modelling of musical timbre , 2007, Pattern Recognit. Lett..

[36]  Biing-Hwang Juang,et al.  Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method , 1998, Proc. IEEE.

[37]  Masataka Goto,et al.  Hybrid Collaborative and Content-based Music Recommendation Using Probabilistic Model with Latent User Preferences , 2006, ISMIR.

[38]  Chin-Hui Lee,et al.  A maximal figure-of-merit (MFoM)-learning approach to robust classifier design for text categorization , 2006, ACM Trans. Inf. Syst..

[39]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[40]  Beth Logan,et al.  A Content-Based Music Similarity Function , 2001 .

[41]  Christopher Raphael,et al.  Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[43]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[44]  Dong Yu,et al.  Large-margin minimum classification error training: A theoretical risk minimization perspective , 2008, Comput. Speech Lang..