Correlation analysis among the metadata-based similarity, acoustic-based distance, and serendipity of music

With the aim of realizing a serendipity-oriented music recommendation, we analyzed the correlation between music similarity and serendipity. A user may be familiar with the musical piece if its metadata, such as the artist's names, and the title, is similar to the music he/she has ever listened to. In addition, a user may prefer the music if it is acoustically similar to the music he/she prefers. Based on these notions, we set up the following hypotheses: Hypothesis I: the user is familiar with the music is if the metadata-based similarity between it and the music he/she prefers is high. Hypothesis II: the music is preferred by the user if the acoustic-based distance between it and the music he/she prefers is low. Hypothesis III: the music is serendipitous (unexpected and useful) if the music has both a low metadata-based similarity and low acoustic-based distance with his/her preferred music. This paper presents our examination of the above hypotheses using data from 1,000 real musical recording.