How scales influence user rating behaviour in recommender systems

ABSTRACT Many websites allow users to rate items and share their ratings with others, for social or personalisation purposes. In recommender systems in particular, personalised suggestions are generated by predicting ratings for items that users are unaware of, based on the ratings users provided for other items. Explicit user ratings are collected by means of graphical widgets referred to as ‘rating scales’. Each system or website normally uses a specific rating scale, in many cases differing from scales used by other systems in their granularity, visual metaphor, numbering or availability of a neutral position. While many works in the field of survey design reported on the effects of rating scales on user ratings, these, however, are normally regarded as neutral tools when it comes to recommender systems. In this paper, we challenge this view and provide new empirical information about the impact of rating scales on user ratings, presenting the results of three new studies carried out in different domains. Based on these results, we demonstrate that a static mathematical mapping is not the best method to compare ratings coming from scales with different features, and suggest when it is possible to use linear functions instead.

[1]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  F. Conrad,et al.  Color, Labels, and Interpretive Heuristics for Response Scales , 2007 .

[3]  Hershey H. Friedman,et al.  Do Numeric Values Influence Subjects’ Responses to Rating Scales? , 2001 .

[4]  A. Colman,et al.  Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. , 2000, Acta psychologica.

[5]  Tsvi Kuflik,et al.  An integrative framework for extending the boundaries of the museum visit experience: linking the pre, during and post visit phases , 2015, J. Inf. Technol. Tour..

[6]  Rong Hu,et al.  Exploring Relations between Personality and User Rating Behaviors , 2013, UMAP Workshops.

[7]  Federica Cena,et al.  The impact of rating scales on user's rating behavior , 2011, UMAP'11.

[8]  L. Weng Impact of the Number of Response Categories and Anchor Labels on Coefficient Alpha and Test-Retest Reliability , 2004 .

[9]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[10]  Federica Cena,et al.  A Study on User Preferential Choices about Rating Scales , 2015, Int. J. Technol. Hum. Interact..

[11]  Jeroen Van Barneveld,et al.  Designing Usable Interfaces for TV Recommender Systems , 2004, Personalized Digital Television.

[12]  Martin Ebner,et al.  The modeling of harmonious color combinations for improved usability and UX , 2009, Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces.

[13]  Lior Rokach,et al.  Recommender Systems Handbook , 2010 .

[14]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[15]  J. Dawes Do Data Characteristics Change According to the Number of Scale Points Used? An Experiment Using 5-Point, 7-Point and 10-Point Scales , 2008 .

[16]  Bracha Shapira,et al.  Recommender Systems Handbook , 2015, Springer US.

[17]  Hock-Earn Lim,et al.  The Use of Different Happiness Rating Scales: Bias and Comparison Problem? , 2008 .

[18]  Florian Keusch,et al.  The Effects of the Direction of Rating Scales on Survey Responses in a Telephone Survey , 2015 .

[19]  Tamara Munzner,et al.  The design space of opinion measurement interfaces: exploring recall support for rating and ranking , 2012, CHI.

[20]  Ricardo Ribeiro,et al.  Understanding temporal dynamics of ratings in the book recommendation scenario , 2013, ISDOC.

[21]  Martina Ziefle,et al.  Navigational User Interface Elements on the Left Side: Intuition of Designers or Experimental Evidence? , 2011, INTERACT.

[22]  R. Garland The Mid-Point on a Rating Scale: Is it Desirable? , 1991 .

[23]  John Riedl,et al.  Is seeing believing?: how recommender system interfaces affect users' opinions , 2003, CHI '03.

[24]  Panos Markopoulos,et al.  Powerful and consistent analysis of likert-type rating scales , 2010, CHI.

[25]  Hershey H. Friedman,et al.  Rating the Rating Scales , 1999 .

[26]  Alfred Kobsa,et al.  The Adaptive Web, Methods and Strategies of Web Personalization , 2007, The Adaptive Web.

[27]  Kirsten Swearingen,et al.  Interaction Design for Recommender Systems , 2002 .

[28]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[29]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[30]  Bruce C. Straits,et al.  Approaches to social research , 1993 .

[31]  S. Floyd,et al.  Adaptive Web , 1997 .

[32]  B. Weijters,et al.  The effect of rating scale format on response styles: the number of response categories and response catgory labels , 2010 .

[33]  Michal Karpowicz,et al.  Reprint of: Computational approaches for mining user's opinions on the Web 2.0 , 2015, Inf. Process. Manag..

[34]  Ricardo Ribeiro,et al.  Understanding the Temporal Dynamics of Recommendations across Different Rating Scales , 2013, UMAP Workshops.

[35]  S. Srivastava,et al.  The Big Five Trait taxonomy: History, measurement, and theoretical perspectives. , 1999 .

[36]  Shilad Sen,et al.  Rating: how difficult is it? , 2011, RecSys '11.

[37]  Federica Cena,et al.  Towards a Customization of Rating Scales in Adaptive Systems , 2010, UMAP.

[38]  J. Dawes Five Point vs. Eleven Point Scales: Does It Make A Difference To Data Characteristics ? , 2002 .

[39]  Zeeshan-ul-hassan Usmani,et al.  Relative Ranking - A Biased Rating , 2008, SCSS.

[40]  Tsvi Kuflik,et al.  Evaluating rating scales personality , 2012, UMAP.