Predicting IMDB Movie Ratings Using Social Media

We predict IMDb movie ratings and consider two sets of features: surface and textual features. For the latter, we assume that no social media signal is isolated and use data from multiple channels that are linked to a particular movie, such as tweets from Twitter and comments from YouTube. We extract textual features from each channel to use in our prediction model and we explore whether data from either of these channels can help to extract a better set of textual feature for prediction. Our best performing model is able to rate movies very close to the observed values.

[1]  Gilad Mishne,et al.  Capturing Global Mood Levels using Blog Posts , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[2]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[3]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[4]  M. de Rijke,et al.  Predicting the volume of comments on online news stories , 2009, CIKM.

[5]  Noah A. Smith,et al.  Movie Reviews and Revenues: An Experiment in Text Regression , 2010, NAACL.

[6]  J. Fox Applied Regression Analysis, Linear Models, and Related Methods , 1997 .

[7]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.