An analysis of the 2014 RecSys Challenge

The RecSys challenge 2014 focuses on the engagement generated by the tweets posted by the users of the IMDb application for smartphones. Such engagement depends on attributes concerning: the user who posts the message (e.g., his role in the social network), the tweet content (e.g., the rating), and the movie object of the tweet (e.g., the popularity of the movie). In this work we provide an analysis of the dataset and of the task to help participants better understand the challenge. Furthermore, we propose a baseline prediction algorithm. We split our analysis into three stages: (i) data enrichment, (ii) knowledge extraction, and (iii) engagement prediction. Initially, we enriched the dataset with additional movie attributes extracted from IMDb and Free-base. Subsequently, we analyzed the statistics of the main tweet attributes and we applied some machine learning techniques to extract additional knowledge from the data. Finally, we defined a predictor on the basis of the main outcomes of these analyses. We define a linear regression model on attributes such as: user rating score, the presence of mentions, and whether the tweet is a retweet or it has already been retweeted. Such predictor led to an nDCG@10 equals to 0.8352.