Simulation as Experiment: An Empirical Critique of Simulation Research on Recommender Systems

Simulation can enable the study of recommender system (RS) evolution while circumventing many of the issues of empirical longitudinal studies; simulations are comparatively easier to implement, are highly controlled, and pose no ethical risk to human participants. How simulation can best contribute to scientific insight about RS alongside qualitative and quantitative empirical approaches is an open question. Philosophers and researchers have long debated the epistemological nature of simulation compared to wholly theoretical or empirical methods [11, 23–27, 33]. Simulation is often implicitly or explicitly conceptualized as occupying a middle ground between empirical and theoretical approaches, allowing researchers to realize the benefits of both [11, 33]. However, what is often ignored in such arguments is that without firm grounding in any single methodological tradition, simulation studies have no agreed upon scientific norms or standards, resulting in a patchwork of theoretical motivations, approaches, and implementations that are difficult to reconcile. In this position paper, we argue that simulation studies of RS are conceptually similar to empirical experimental approaches and therefore can be evaluated using the standards of empirical research methods. Using this empirical lens, we argue that the combination of high heterogeneity in approaches and low transparency in methods in simulation studies of RS has limited their interpretability, generalizability, and replicability. We contend that by adopting standards and practices common in empirical disciplines, simulation researchers can mitigate many of these weaknesses.

[1]  Gediminas Adomavicius,et al.  Toward identification and adoption of best practices in algorithmic recommender systems research , 2013, RepSys '13.

[2]  Cailin O'Connor,et al.  Simulation and Similarity: Using Models to Understand the World , 2016 .

[3]  Jöran Beel,et al.  Towards reproducibility in recommender-systems research , 2016, User Modeling and User-Adapted Interaction.

[4]  Dietmar Jannach,et al.  Are we really making much progress? A worrying analysis of recent neural recommendation approaches , 2019, RecSys.

[5]  John Riedl,et al.  Rethinking the recommender research ecosystem: reproducibility, openness, and LensKit , 2011, RecSys '11.

[6]  Tor Lattimore,et al.  Degenerate Feedback Loops in Recommender Systems , 2019, AIES.

[7]  H. S. Booker,et al.  Research Methods for the Behavioral Sciences , 2020 .

[8]  J. Calder Survey research methods , 1998, Medical education.

[9]  Arvind Narayanan,et al.  T-RECS: A Simulation Tool to Study the Societal Impact of Recommender Systems , 2021, ArXiv.

[10]  Thomas C. Schelling,et al.  Dynamic models of segregation , 1971 .

[11]  Julian Reiss,et al.  A Plea for (Good) Simulations: Nudging Economics Toward an Experimental Science , 2011 .

[12]  Barbara E. Engelhardt,et al.  How algorithmic confounding in recommendation systems increases homogeneity and decreases utility , 2017, RecSys.

[13]  S. Peck Simulation as experiment: a philosophical reassessment for biological modeling. , 2004, Trends in ecology & evolution.

[14]  Eric Winsberg,et al.  Simulated Experiments: Methodology for a Virtual World , 2003, Philosophy of Science.

[15]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[16]  Eli Pariser,et al.  The Filter Bubble: What the Internet Is Hiding from You , 2011 .

[17]  Duncan J. Watts,et al.  The Structural Virality of Online Diffusion , 2015, Manag. Sci..

[18]  B. Wansink,et al.  Asking Questions: The Definitive Guide to Questionnaire Design -- For Market Research, Political Polls, and Social and Health Questionnaires , 2004 .

[19]  Matthew J. Salganik,et al.  Bit by bit: social research in the digital age , 2019, The Journal of mathematical sociology.

[20]  Bamshad Mobasher,et al.  Feedback Loop and Bias Amplification in Recommender Systems , 2020, CIKM.

[21]  Kathleen M. Eisenhardt,et al.  Developing Theory Through Simulation Methods , 2006 .

[22]  D. Mook,et al.  In defense of external invalidity. , 1983 .

[23]  Dietmar Jannach,et al.  Recommendations with a Purpose , 2016, RecSys.

[24]  Julian Reiss,et al.  Against external validity , 2019, Synthese.

[25]  Monya Baker,et al.  Cancer reproducibility project releases first results , 2017, Nature.

[26]  Wendy S. Parker,et al.  Evidence and Knowledge from Computer Simulation , 2020, Erkenntnis.

[27]  Wendy S. Parker,et al.  Does matter really matter? Computer simulations, experiments, and materiality , 2009, Synthese.

[28]  L. Cronbach,et al.  Construct validity in psychological tests. , 1955, Psychological bulletin.

[29]  Mahzarin R. Banaji,et al.  The Bankruptcy of Everyday Memory , 1989 .

[30]  Guy Aridor,et al.  Deconstructing the Filter Bubble: User Decision-Making and Recommender Systems , 2019, RecSys.

[31]  Wendy S. Parker,et al.  Franklin, Holmes, and the Epistemology of Computer Simulation , 2008 .