Detecting polarization in ratings: An automated pipeline and a preliminary quantification on several benchmark data sets

Personalized recommender systems are becoming increasingly relevant and important in the study of polarization and bias, given their widespread use in filtering information spaces. Polarization is a social phenomenon, with serious consequences, in real-life, particularly on social media. Thus it is important to understand how machine learning algorithms, especially recommender systems, behave in polarized environments. In this paper, we study polarization within the context of the users' interactions with a space of items and how this affects recommender systems. We first formalize the concept of polarization based on item ratings and then relate it to the item reviews to investigate any potential correlation. We then propose a domain independent data science pipeline to automatically detect polarization using the ratings rather than the typical properties used to detect polarization, such as item's content or social network topology. We perform an extensive comparison of polarization measures on several benchmark data sets and show that our polarization detection framework can detect different degrees of polarization and outperforms existing measures in capturing an intuitive notion of polarization. Our work is an essential step toward quantifying and detecting polarization in ongoing ratings and in benchmark data sets, and to this end, we use our developed polarization detection pipeline to compute the polarization prevalence of several benchmark data sets. It is our hope that this work will contribute to supporting future research in the emerging topic of designing and studying the behavior of recommender systems in polarized environments.

[1]  D. Isenberg Group polarization: A critical review and meta-analysis. , 1986 .

[2]  Philippe A. Palanque,et al.  Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , 2014, International Conference on Human Factors in Computing Systems.

[3]  Advances in Web Mining and Web Usage Analysis, 7th International Workshop on Knowledge Discovery on the Web, WebKDD 2005, Chicago, IL, USA, August 21, 2005. Revised Papers , 2006, WebKDD.

[4]  Steven T. Garren,et al.  Maximum likelihood estimation of the correlation coefficient in a bivariate normal model with missing data , 1998 .

[5]  Amy X. Zhang,et al.  Controversy and Sentiment in Online News , 2014, ArXiv.

[6]  Sean A. Munson,et al.  Presenting diverse political opinions: how and how much , 2010, CHI.

[7]  C. Sunstein The Law of Group Polarization , 1999, How Change Happens.

[8]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[9]  Wai-Tat Fu,et al.  Can you hear me now?: mitigating the echo chamber effect by source position indicators , 2014, CSCW.

[10]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[11]  Bart P. Knijnenburg,et al.  Recommender Systems for Self-Actualization , 2016, RecSys.

[12]  Aristides Gionis,et al.  Quantifying Controversy in Social Media , 2015, WSDM.

[13]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[14]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[15]  Frank Schweitzer,et al.  When the filter bubble bursts: collective evaluation dynamics in online communities , 2016, WebSci.

[16]  Panayiotis Tsaparas,et al.  Temporal mechanisms of polarization in online reviews , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[17]  Vijil Chenthamarakshan,et al.  Amplifying the voice of youth in Africa via text analytics , 2013, KDD.

[18]  Amy Beth Warriner,et al.  Norms of valence, arousal, and dominance for 13,915 English lemmas , 2013, Behavior Research Methods.

[19]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[20]  A. J. Morales,et al.  Measuring Political Polarization: Twitter shows the two sides of Venezuela , 2015, Chaos.

[21]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[22]  Xiaodong Gu,et al.  Histogram similarity measure using variable bin size distance , 2010, Comput. Vis. Image Underst..

[23]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[24]  Chris Cornelis,et al.  Trust- and Distrust-Based Recommendations for Controversial Reviews , 2011, IEEE Intelligent Systems.

[25]  Chris Cornelis,et al.  A Comparative Analysis of Trust-Enhanced Recommenders for Controversial Items , 2009, ICWSM.

[26]  John Yen,et al.  Advances in Web Mining and Web Usage Analysis, 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006, Philadelphia, PA, USA, August 20, 2006, Revised Papers , 2007, WebKDD.

[27]  Sahin Albayrak,et al.  Analyzing weighting schemes in collaborative filtering: cold start, post cold start and power users , 2012, SAC '12.

[28]  J. Russell,et al.  Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. , 1999, Journal of personality and social psychology.

[29]  David Lee,et al.  Biased assimilation, homophily, and the dynamics of polarization , 2012, Proceedings of the National Academy of Sciences.