Automatic Prediction of Hit Songs

We explore the automatic analysis of music to identify likely hit songs. We extract both acoustic and lyric information from each song and separate hits from non-hits using standard classifiers, specifically Support Vector Machines and boosting classifiers. Our features are based on global sounds learnt in an unsupervised fashion from acoustic data or global topics learnt from a lyrics database. Experiments on a corpus of 1700 songs demonstrate performance that is much better than random. The lyricbased features are slightly more useful than the acoustic features in correctly identifying hit songs. Concatenating the two features does not produce significant improvements. Analysis of the lyric-based features shows that the absence of certain semantic information indicates that a song is more likely to be a hit.

[1]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[2]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[3]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[4]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[5]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[8]  Daniel P. W. Ellis,et al.  Anchor space for classification and similarity measurement of music , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[9]  Beth Logan,et al.  Semantic analysis of song lyrics , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).