Predicting Billboard Success Using Data-Mining in P2P Networks

Peer to Peer networks are the leading cause for music piracy but also used for music sampling prior to purchase. In this paper we investigate the relations between music file sharing and sales (both physical and digital)using large Peer-to-Peer query database information. We compare file sharing information on songs to their popularity on the Billboard Hot 100 and the Billboard Digital Songs charts, and show that popularity trends of songs on the Billboard have very strong correlation (0.88-0.89) to their popularity on a Peer-to-Peer network. We then show how this correlation can be utilized by common data mining algorithms to predict a song's success in the Billboard in advance, using Peer-to-Peer information.