论文信息 - Preference Music Ratings Prediction Using Tokenization and Minimum Classification Error Training

Preference Music Ratings Prediction Using Tokenization and Minimum Classification Error Training

In order to address two main limitations of current content-based music recommendation approaches, an ordinal regression algorithm for music recommendation that incorporates dynamic information is presented. Instead of assuming that local spectral features within a song are identically and independently distributed examples of an underlying probability density, music is characterized by a vocabulary of acoustic segment models (ASMs), which are found with an unsupervised process. Further, instead of classifying music based on subjective classes, such as genre, or trying to find a universal notion of similarity, songs are classified based on personal preference ratings. The ordinal regression approach to perform the ratings prediction is based on the discriminative-training algorithm known as minimum classification error (MCE) training. Experimental results indicate that improved temporal modeling leads to superior performance over standard spectral-based music representations. Further, the MCE-based preference ratings algorithm is shown to be superior over two other systems. Analysis demonstrates that the superior performance is due to MCE being a non-conservative algorithm that demonstrates immunity to outliers.

C.-H. Lee | J. Reed | Jeremy Reed | Chin-Hui Lee

[1] Jyh-Shing Roger Jang,et al. On the use of sequential patterns mining as temporal features for music genre classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[4] Marc A. Zissman,et al. Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[5] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[6] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7] Ichiro Fujinaga,et al. Musical genre classification: Is it worth pursuing and how can it be improved? , 2006, ISMIR.

[8] Daniel P. W. Ellis,et al. A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[9] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[10] James Bennett,et al. The Netflix Prize , 2007 .

[11] Juan Pablo Bello. Grouping Recorded Music by Structural Similarity , 2009, ISMIR.

[12] Masataka Goto,et al. Instrument Equalizer for Query-by-Example Retrieval: Improving Sound Source Separation Based on Integrated Harmonic and Inharmonic Models , 2008, ISMIR.

[13] Thore Graepel,et al. Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[14] Shigeru Katagiri,et al. A derivation of minimum classification error from the theoretical classification risk using Parzen estimation , 2004, Comput. Speech Lang..

[15] Lie Lu,et al. Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16] Mathieu Lagrange,et al. Multimodal similarity between musical streams for cover version detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17] Amnon Shashua,et al. Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[18] François Pachet,et al. Representing Musical Genre: A State of the Art , 2003 .

[19] Torbjørn Svendsen,et al. On the automatic segmentation of speech signals , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[21] Daniel P. W. Ellis,et al. Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[22] D. Ellis. Learning the meaning of music , 2005 .

[23] Chin-Hui Lee,et al. A Study on Music Genre Classification Based on Universal Acoustic Models , 2006, ISMIR.

[24] François Pachet,et al. A taxonomy of musical genres , 2000, RIAO.

[25] David M. Pennock,et al. Categories and Subject Descriptors , 2001 .

[26] Elias Pampalk,et al. Introduction–From Genres to Tags: A Little Epistemology of Music Information Retrieval Research , 2008 .

[27] Tao Li,et al. Content-based music similarity search and emotion detection , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28] Chin-Hui Lee,et al. On the importance of modeling temporal information in music tag annotation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29] Mehryar Mohri,et al. Robust Music Identification, Detection, and Analysis , 2007, ISMIR.

[30] Koby Crammer,et al. Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[31] Biing-Hwang Juang,et al. Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[32] François Pachet,et al. The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[33] Daniel P. W. Ellis,et al. Chord segmentation and recognition using EM-trained hidden markov models , 2003, ISMIR.

[34] Wei Chu,et al. New approaches to support vector ordinal regression , 2005, ICML.

[35] François Pachet,et al. The influence of polyphony on the dynamical modelling of musical timbre , 2007, Pattern Recognit. Lett..

[36] Biing-Hwang Juang,et al. Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method , 1998, Proc. IEEE.

[37] Masataka Goto,et al. Hybrid Collaborative and Content-based Music Recommendation Using Probabilistic Model with Latent User Preferences , 2006, ISMIR.

[38] Chin-Hui Lee,et al. A maximal figure-of-merit (MFoM)-learning approach to robust classifier design for text categorization , 2006, ACM Trans. Inf. Syst..

[39] J.R. Bellegarda,et al. Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[40] Beth Logan,et al. A Content-Based Music Similarity Function , 2001 .

[41] Christopher Raphael,et al. Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[42] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[43] Koby Crammer,et al. Pranking with Ranking , 2001, NIPS.

[44] Dong Yu,et al. Large-margin minimum classification error training: A theoretical risk minimization perspective , 2008, Comput. Speech Lang..