"My Curiosity was Satisfied, but not in a Good Way": Predicting User Ratings for Online Recipes

In this paper, we develop an approach to automatically predict user ratings for recipes at Epicurious.com, based on the recipes’ reviews. We investigate two distributional methods for feature selection, Information Gain and Bi-Normal Separation; we also compare distributionally selected features to linguistically motivated features and two types of frameworks: a one-layer system where we aggregate all reviews and predict the rating vs. a two-layer system where ratings of individual reviews are predicted and then aggregated. We obtain our best results by using the two-layer architecture, in combination with 5 000 features selected by Information Gain. This setup reaches an overall accuracy of 65.60%, given an upper bound of 82.57%.

[1]  Hui Zhang,et al.  WIDIT in TREC 2007 Blog Track: Combining Lexicon-Based Methods to Detect Opinionated Blogs , 2007, TREC.

[2]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[3]  Michael L. Anderson,et al.  Learning from the Crowd: Regression Discontinuity Estimates of the Effects of an Online Review Database , 2012 .

[4]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[5]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[6]  Sandra Kübler,et al.  Filling the Gap: Semi-Supervised Learning for Opinion Detection Across Domains , 2011, CoNLL.

[7]  Dan Klein,et al.  Optimization, Maxent Models, and Conditional Estimation without Magic , 2003, NAACL.

[8]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[9]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[10]  Boi Faltings,et al.  Rating aggregation in collaborative filtering systems , 2009, RecSys '09.

[11]  Ning Yu,et al.  Exploring Co‐training strategies for opinion detection , 2014, J. Assoc. Inf. Sci. Technol..

[12]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[13]  Sandra Kübler,et al.  Feature Selection for Highly Skewed Sentiment Analysis Tasks , 2014, SocialNLP@COLING.

[14]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[15]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[16]  Moshe Koppel,et al.  THE IMPORTANCE OF NEUTRAL EXAMPLES FOR LEARNING SENTIMENT , 2006, Comput. Intell..

[17]  Desislava Zhekova,et al.  Do Good Recipes Need Butter ? Predicting User Ratings of Online Recipes , 2013 .

[18]  Eric Gilbert,et al.  Demographics, weather and online reviews: a study of restaurant recommendations , 2014, WWW.

[19]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[20]  Natalie S. Glance,et al.  Star Quality: Aggregating Reviews to Rank Products and Merchants , 2010, ICWSM.