An Analysis of Classification Approaches for Hit Song Prediction using Engineered Metadata Features with Lyrics and Audio Features

Hit song prediction, one of the emerging fields in music information retrieval (MIR), remains a considerable challenge. Being able to understand what makes a given song a hit is clearly beneficial to the whole music industry. Previous approaches to hit song prediction have focused on using audio features of a record. This study aims to improve the prediction result of the top 10 hits among Billboard Hot 100 songs using more alternative metadata, including song audio features provided by Spotify, song lyrics, and novel metadata-based features (title topic, popularity continuity and genre class). Five machine learning approaches are applied, including: k-nearest neighbours, Naive Bayes, Random Forest, Logistic Regression and Multilayer Perceptron. Our results show that Random Forest (RF) and Logistic Regression (LR) with all features (including novel features, song audio features and lyrics features) outperforms other models, achieving 89.1% and 87.2% accuracy, and 0.91 and 0.93 AUC, respectively. Our findings also demonstrate the utility of our novel music metadata features, which contributed most to the models' discriminative performance.

[1]  JooHee Oh,et al.  Music Intelligence: Granular Data and Prediction of Top Ten Hit Songs , 2020, Decis. Support Syst..

[2]  Kai Middlebrook,et al.  Song Hit Prediction: Predicting Billboard Hits Using Spotify Data , 2019, ArXiv.

[3]  P. Rentfrow,et al.  Music and big data: a new frontier , 2017, Current Opinion in Behavioral Sciences.

[4]  Kenneth Sörensen,et al.  Dance Hit Song Prediction , 2014, ArXiv.

[5]  Matthew J. Salganik,et al.  Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market , 2006, Science.

[6]  Yi-Hsuan Yang,et al.  Hit Song Prediction: Leveraging Low- and High-Level Audio Features , 2019, ISMIR.

[7]  N. Burton,et al.  HITPREDICT : PREDICTING HIT SONGS USING SPOTIFY DATA STANFORD COMPUTER SCIENCE 229 : MACHINE LEARNING , 2018 .

[8]  Julien Kawawa-Beaudan Predicting Billboard Top 100 Songs , 2015 .

[9]  Michael A. Casey,et al.  Study of Chinese and UK Hit Songs Prediction , 2013 .

[10]  T. D. Bie,et al.  Hit Song Science Once Again a Science? , 2011 .

[11]  NICHOLAS BORG,et al.  WHAT MAKES FOR A HIT POP SONG ? WHAT MAKES FOR A POP SONG ? , 2011 .

[12]  François Pachet,et al.  Hit Song Science Is Not Yet a Science , 2008, ISMIR.

[13]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[14]  Beth Logan,et al.  Automatic Prediction of Hit Songs , 2005, ISMIR.

[15]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[16]  S. Kinga The Attributes and Values of Folk and Popular Songs , 2001 .