Towards Better User Studies in Computer Graphics and Vision

Online crowdsourcing platforms have made it increasingly easy to perform evaluations of algorithm outputs with survey questions like"which image is better, A or B?", leading to their proliferation in vision and graphics research papers. Results of these studies are often used as quantitative evidence in support of a paper's contributions. On the one hand we argue that, when conducted hastily as an afterthought, such studies lead to an increase of uninformative, and, potentially, misleading conclusions. On the other hand, in these same communities, user research is underutilized in driving project direction and forecasting user needs and reception. We call for increased attention to both the design and reporting of user studies in computer vision and graphics papers towards (1) improved replicability and (2) improved project direction. Together with this call, we offer an overview of methodologies from user experience research (UXR), human-computer interaction (HCI), and applied perception to increase exposure to the available methodologies and best practices. We discuss foundational user research methods (e.g., needfinding) that are presently underutilized in computer vision and graphics research, but can provide valuable project direction. We provide further pointers to the literature for readers interested in exploring other UXR methodologies. Finally, we describe broader open issues and recommendations for the research community.

[1]  Adam Pearce,et al.  Large Scale Qualitative Evaluation of Generative Image Model Outputs , 2023, ArXiv.

[2]  A. Hwang,et al.  In the eye of the beholder: A viewer-defined conception of online visual creativity , 2022, New Media & Society.

[3]  C. Kaplan,et al.  A Primitive for Manual Hatching , 2022, ACM Trans. Graph..

[4]  A. M. Darke,et al.  Countering Racial Bias in Computer Graphics Research , 2021, Special Interest Group on Computer Graphics and Interactive Techniques Conference Talks.

[5]  A. Rogozhnikov Einops: Clear and Reliable Tensor Manipulations with Einstein-like Notation , 2022, ICLR.

[6]  Fernando Diaz,et al.  The Benchmark Lottery , 2021, ArXiv.

[7]  N. Mitra,et al.  ShapeMOD , 2021, ACM Trans. Graph..

[8]  Harsha Gangadharbatla The Role of AI Attribution Knowledge in the Evaluation of Artwork , 2021, Empirical Studies of the Arts.

[9]  Joelle Pineau,et al.  Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) , 2020, J. Mach. Learn. Res..

[10]  K. Myszkowski,et al.  The effect of shape and illumination on material perception , 2021, ACM Trans. Graph..

[11]  [MOBI] Interviewing Users How To Uncover Compelling Insights , 2021 .

[12]  Stephen DiVerdi,et al.  View-Dependent Effects for 360° Virtual Reality Video , 2020, UIST.

[13]  Trevor Hastie,et al.  Transparency and reproducibility in artificial intelligence , 2020, Nature.

[14]  David G. Rand,et al.  Who Gets Credit for AI-Generated Art? , 2020, iScience.

[15]  Aaron Hertzmann,et al.  Predicting Visual Importance Across Graphic Design Types , 2020, UIST.

[16]  Carl Gutwin,et al.  Threats of a replication crisis in empirical computer science , 2020, Commun. ACM.

[17]  Takeo Igarashi,et al.  Human-in-the-loop differential subspace search in high-dimensional latent space , 2020, ACM Trans. Graph..

[18]  Stacy E. Lom The Metric Society: On the Quantification of the Social , 2020 .

[19]  David Coeurjolly,et al.  Code replicability in computer graphics , 2020, ACM Trans. Graph..

[20]  Rynson W. H. Lau,et al.  ICONATE: Automatic Compound Icon Generation and Ideation , 2020, CHI.

[21]  Diyi Yang,et al.  “This is a Problem, Don’t You Agree?” Framing and Bias in Human Evaluation for Natural Language Generation , 2020, EVALNLGEVAL.

[22]  Sarah C. Kucker,et al.  An MTurk Crisis? Shifts in Data Quality and the Impact on Study Results , 2019, Social Psychological and Personality Science.

[23]  Nathan Crilly,et al.  Methodological diversity and theoretical integration: Research in design fixation as an example of fixation in research design? , 2019, Design Studies.

[24]  Angel X. Chang,et al.  PlanIT: planning and instantiating indoor scenes with relation graph and spatial prior networks , 2019, ACM Trans. Graph..

[25]  Division on Earth,et al.  Reproducibility and Replicability in Science , 2019 .

[26]  Michael S. Bernstein,et al.  HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models , 2019, NeurIPS.

[27]  Kai Wang,et al.  Fast and Flexible Indoor Scene Synthesis via Deep Convolutional Generative Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  D. Fitch,et al.  Review of "Algorithms of oppression: how search engines reinforce racism," by Noble, S. U. (2018). New York, New York: NYU Press. , 2018, CDQR.

[29]  Adam Finkelstein,et al.  PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Xueting Li,et al.  A Closed-form Solution to Photorealistic Image Stylization , 2018, ECCV.

[31]  Lorena A. Barba,et al.  Terminologies for Reproducible Research , 2018, ArXiv.

[32]  Hans Ekkehard Plesser,et al.  Reproducibility vs. Replicability: A Brief History of a Confused Terminology , 2018, Front. Neuroinform..

[33]  Zeynep Arsel Asking Questions with Reflexive Focus: A Tutorial on Designing and Conducting Interviews , 2017 .

[34]  Li Fei-Fei,et al.  Crowdsourcing in Computer Vision , 2016, Found. Trends Comput. Graph. Vis..

[35]  Mario Biagioli,et al.  Watch out for cheats in citation game , 2016, Nature.

[36]  Safiya Umoja Noble,et al.  The Intersectional Internet: Race, Sex, Class, and Culture Online , 2016 .

[37]  H. P. de Vladar,et al.  Why Greatness Cannot Be Planned: The Myth of the Objective , 2016, Leonardo.

[38]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[39]  Cory R. Schaffhausen,et al.  Large-Scale Needfinding Methods, Quality Metrics, and Need Prioritization in User-Centered Design , 2015 .

[40]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[41]  Katharina Reinecke,et al.  LabintheWild: Conducting Large-Scale Online Experiments With Uncompensated Samples , 2015, CSCW.

[42]  Joseph A. Paradiso,et al.  Augmented Airbrush for Computer Aided Painting (CAP) , 2015, TOGS.

[43]  I-Chant A. Chiang,et al.  From Moral Principles to Ethics Codes , 2015 .

[44]  Stuart Reeves,et al.  RepliCHI: the workshop II , 2014, CHI Extended Abstracts.

[45]  Jakob Grue Simonsen,et al.  Is once enough?: on the extent and content of replications in human-computer interaction , 2014, CHI.

[46]  Siddhartha Chaudhuri,et al.  Attribit: content creation with semantic attributes , 2013, UIST.

[47]  Mike Kuniavsky,et al.  Observing the User Experience: A Practitioner's Guide to User Research (Second Edition) , 2013, IEEE Transactions on Professional Communication.

[48]  Stephen DiVerdi,et al.  RealBrush: painting with examples of physical media , 2013, ACM Trans. Graph..

[49]  Stuart Reeves,et al.  Is Replication Important for HCI? , 2013, RepliCHI.

[50]  H. Beek F1000Prime recommendation of False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. , 2012 .

[51]  Stephen DiVerdi,et al.  HelpingHand: example-based stroke stylization , 2012, ACM Trans. Graph..

[52]  Jeff Sauro,et al.  Quantifying the User Experience: Practical Statistics for User Research , 2012 .

[53]  H. Bülthoff,et al.  Render me real?: investigating the effect of render style on the perception of animated virtual humans , 2012, ACM Trans. Graph..

[54]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[55]  Russell K. Schutt,et al.  Research Methods in Education , 2011 .

[56]  Aaron Hertzmann,et al.  Color compatibility from large datasets , 2011, ACM Trans. Graph..

[57]  Maneesh Agrawala,et al.  Interactive furniture layout using interior design guidelines , 2011, ACM Trans. Graph..

[58]  Leonidas J. Guibas,et al.  Probabilistic reasoning for assembly-based 3D modeling , 2011, ACM Trans. Graph..

[59]  Wilmot Li,et al.  Design principles for visual communication , 2011, Commun. ACM.

[60]  E. B. Zechmeister,et al.  Research Methods In Psychology, 9th Edition , 2011 .

[61]  Dan Jones Psychology. A WEIRD view of human nature skews psychologists' studies. , 2010, Science.

[62]  C. Lawrence Zitnick,et al.  The role of features, algorithms and data in visual recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Fei-Fei Li,et al.  Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[64]  Richard Szeliski,et al.  A content-aware image prior , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[65]  Dev Patnaik,et al.  Needfinding: The Why and How of Uncovering People's Needs , 2010 .

[66]  Aaron Hertzmann,et al.  Non-Photorealistic Rendering and the science of art , 2010, NPAR.

[67]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[68]  Pat Hanrahan,et al.  Exploratory modeling with collaborative design spaces , 2009, ACM Trans. Graph..

[69]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Alexander C. Berg,et al.  Finding iconic images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[71]  Anna Cox,et al.  Controlled experiments , 2009 .

[72]  T. Funkhouser,et al.  How well do line drawings depict shape? , 2009, SIGGRAPH '09.

[73]  Ron Kohavi,et al.  Responsible editor: R. Bayardo. , 2022 .

[74]  Kristen Grauman,et al.  Multi-Level Active Prediction of Useful Image Annotations for Recognition , 2008, NIPS.

[75]  Pietro Perona,et al.  Some Objects Are More Equal Than Others: Measuring and Predicting Importance , 2008, ECCV.

[76]  Carol O'Sullivan,et al.  Clone attack! Perception of crowd variety , 2008, ACM Trans. Graph..

[77]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[78]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[79]  William Buxton,et al.  Usability evaluation considered harmful (some of the time) , 2008, CHI.

[80]  Austin Lee Nichols,et al.  The Good-Subject Effect: Investigating Participant Demand Characteristics , 2008, The Journal of general psychology.

[81]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[82]  David Dunning,et al.  Self-Image Motives and Consumer Behavior: How Sacrosanct Self-Beliefs Sway Preferences in the Marketplace , 2007 .

[83]  John Zimmerman,et al.  Rapidly Exploring Application Design Through Speed Dating , 2007, UbiComp.

[84]  Philip Dutré,et al.  The influence of shape on the perception of material reflectance , 2007, ACM Trans. Graph..

[85]  Stephen R. Rosenthal,et al.  Ethnographies in the Front End: Designing for Enhanced Customer Experiences* , 2006 .

[86]  Thanh Giang,et al.  Evaluating the visual fidelity of physically based animations , 2003, ACM Trans. Graph..

[87]  Miguel P Caldas,et al.  Research design: qualitative, quantitative, and mixed methods approaches , 2003 .

[88]  John Dingliana,et al.  Collisions and perception , 2001, TOGS.

[89]  L. Finlay Reflexivity: An Essential Component for All Research? , 1998 .

[90]  Shelley E. Taylor,et al.  Situated Optimism: Specific Outcome Expectancies and Self-Regulation , 1998 .

[91]  S. Greenberg,et al.  The Weak Science of Human-Computer Interaction , 1998 .

[92]  Debora Shaw,et al.  Handbook of usability testing: How to plan, design, and conduct effective tests , 1996 .

[93]  Marita Franzke,et al.  Usability evaluation with the cognitive walkthrough , 1995, CHI 95 Conference Companion.

[94]  J. Nielsen Usability inspection methods , 1994, CHI Conference Companion.

[95]  R. Cialdini Influence: The Psychology of Persuasion , 1993 .

[96]  Richard A. Parker,et al.  Designing and Conducting Survey Research: A Comprehensive Guide , 1992 .

[97]  Jakob Nielsen,et al.  Heuristic evaluation of user interfaces , 1990, CHI '90.

[98]  E. Hippel,et al.  Lead users: a source of novel product concepts , 1986 .

[99]  L. Ross,et al.  The “false consensus effect”: An egocentric bias in social perception and attribution processes , 1977 .