Between Subjectivity and Imposition

The interpretation of data is fundamental to machine learning. This paper investigates practices of image data annotation as performed in industrial contexts. We define data annotation as a sense-making practice, where annotators assign meaning to data through the use of labels. Previous human-centered investigations have largely focused on annotators subjectivity as a major cause for biased labels. We propose a wider view on this issue: guided by constructivist grounded theory, we conducted several weeks of fieldwork at two annotation companies. We analyzed which structures, power relations, and naturalized impositions shape the interpretation of data. Our results show that the work of annotators is profoundly informed by the interests, values, and priorities of other actors above their station. Arbitrary classifications are vertically imposed on annotators, and through them, on data. This imposition is largely naturalized. Assigning meaning to data is often presented as a technical matter. This paper shows it is, in fact, an exercise of power with multiple implications for individuals and society.

[1]  Hanna M. Wallach,et al.  A Human-Centered Agenda for Intelligible Machine Learning , 2021 .

[2]  Joelle Pineau,et al.  Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) , 2020, J. Mach. Learn. Res..

[3]  Caitlin Lustig,et al.  How We've Taught Algorithms to See Identity: Constructing Race and Gender in Image Databases for Facial Analysis , 2020, Proc. ACM Hum. Comput. Interact..

[4]  Matti Nelimarkka,et al.  Bureaucracy as a Lens for Analyzing and Designing Algorithmic Systems , 2020, CHI.

[5]  Hanna M. Wallach,et al.  Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI , 2020, CHI.

[6]  Klaus Mueller,et al.  Measuring Social Biases of Crowd Workers using Counterfactual Queries , 2020, ArXiv.

[7]  Gunay Kazimzade,et al.  Biased Priorities, Biased Outcomes: Three Recommendations for Ethics-oriented Data Annotation Practices , 2020, AIES.

[8]  Karthik Dinakar,et al.  Studying up: reorienting the study of algorithmic fairness around issues of power , 2020, FAT*.

[9]  R. Stuart Geiger,et al.  Garbage in, garbage out?: do machine learning application papers in social computing report where human-labeled training data comes from? , 2019, FAT*.

[10]  Emily Denton,et al.  Towards a critical race methodology in algorithmic fairness , 2019, FAT*.

[11]  Smitha Milli,et al.  Value-laden disciplinary shifts in machine learning , 2019, FAT*.

[12]  Jed R. Brubaker,et al.  How Computers See Gender , 2019, Proc. ACM Hum. Comput. Interact..

[13]  Ryan Burns New Frontiers of Philanthro‐capitalism: Digital Technologies and Humanitarianism , 2019, Antipode.

[14]  Nick Seaver Knowing Algorithms , 2019, digitalSTS.

[15]  Mary L. Gray,et al.  Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass , 2019 .

[16]  David Nemer,et al.  digitalSTS: A Field Guide for Science & Technology Studies , 2019 .

[17]  Michael J. Muller,et al.  How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation , 2019, CHI.

[18]  Besnik Fetahu,et al.  Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments , 2019, CHI.

[19]  Michael S. Bernstein,et al.  Street-Level Algorithms: A Theory at the Gaps Between Policy and Decisions , 2019, CHI.

[20]  Solon Barocas,et al.  Problem Formulation and Fairness , 2019, FAT.

[21]  R. Kitchin,et al.  Thinking critically about and researching algorithms , 2014, The Social Power of Algorithms.

[22]  A. Korolova,et al.  Discrimination through Optimization: How Facebook’s Ad Delivery Can Lead to Biased Outcomes , 2019 .

[23]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[24]  Hannah Lebovits Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor , 2018, Public Integrity.

[25]  N. Couldry,et al.  Data Colonialism: Rethinking Big Data’s Relation to the Contemporary Subject , 2018, Television & New Media.

[26]  D. Fitch,et al.  Review of "Algorithms of oppression: how search engines reinforce racism," by Noble, S. U. (2018). New York, New York: NYU Press. , 2018, CDQR.

[27]  Jakob Svensson,et al.  The end of media logics? On algorithms and agency , 2018, New Media Soc..

[28]  Gaëlle Loosli,et al.  Baselines and a datasheet for the Cerema AWP dataset , 2018, ArXiv.

[29]  Ahmed Hosny,et al.  The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards , 2018, Data Protection and Privacy.

[30]  Morgan Klaus Scheuerman,et al.  Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems , 2018, CHI.

[31]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[32]  M. C. Elish,et al.  Situating methods in the magic of Big Data and AI , 2018 .

[33]  Harris Mateen Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2018 .

[34]  Melanie Feinberg,et al.  A Design Perspective on Data , 2017, CHI.

[35]  Steven J. Jackson,et al.  Data Vision: Learning to See Through Algorithmic Abstraction , 2017, CSCW.

[36]  Shion Guha,et al.  Machine Learning and Grounded Theory Method: Convergence, Divergence, and Combination , 2016, GROUP.

[37]  Angèle Christin From daguerreotypes to algorithms: machines, expertise, and three forms of objectivity , 2016, CSOC.

[38]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[39]  Jeanette Blomberg,et al.  Towards an Anthropology of Services , 2015 .

[40]  Kathleen H. Pine,et al.  The Politics of Measurement and Action , 2015, CHI.

[41]  Joel Young,et al.  Leveraging In-Batch Annotation Bias for Crowdsourced Active Learning , 2015, WSDM.

[42]  Natalia M. Libakova,et al.  The Method of Expert Interview as an Effective Research Procedure of Studying the Indigenous Peoples of the North , 2015 .

[43]  J. Söderberg Media Technologies - Essays on Communication, Materiality, and Society , 2014 .

[44]  Alex Rosenblat,et al.  Networked Employment Discrimination , 2014 .

[45]  Jeanette Blomberg,et al.  Toward an Anthropology of Services , 2014 .

[46]  Frank A. Pasquale,et al.  [89WashLRev0001] The Scored Society: Due Process for Automated Predictions , 2014 .

[47]  James Mussell Raw Data is an Oxymoron , 2014 .

[48]  Michael Muller,et al.  Curiosity, Creativity, and Surprise as Analytic Tools: Grounded Theory Method , 2014, Ways of Knowing in HCI.

[49]  Kieran Healy,et al.  Classification situations: Life-chances in the neoliberal era , 2013 .

[50]  Justin Cheng,et al.  How annotation styles influence content and preferences , 2013, HT '13.

[51]  Paul Baker,et al.  ‘Why do white people have thin lips?’ Google and the perpetuation of stereotypes via auto-complete search forms , 2013 .

[52]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[53]  Michael I. Jordan,et al.  Bayesian Bias Mitigation for Crowdsourcing , 2011, NIPS.

[54]  Astrid Mager Algorithmic Ideology: How Capitalist Society Shapes Search Engines , 2011 .

[55]  Dirk Snelders,et al.  The Object of Service Design , 2011, Design Issues.

[56]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[57]  L. Araujo,et al.  Services, products, and the institutional structure of production , 2006 .

[58]  Anselm L. Strauss,et al.  Grounded theory : Strategien qualitativer Forschung , 2006 .

[59]  Susan Leigh Star,et al.  Layers of Silence, Arenas of Voice: The Ecology of Visible and Invisible Work , 1999, Computer Supported Cooperative Work (CSCW).

[60]  Geoffrey C. Bowker Biodiversity Datadiversity , 2000 .

[61]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[62]  Anselm L. Strauss,et al.  Basics of qualitative research : techniques and procedures for developing grounded theory , 1998 .

[63]  Ciaran Cronin,et al.  Bourdieu and Foucault on power and modernity , 1996 .

[64]  Eviatar Zerubavel,et al.  The fine line : making distinctions in everyday life , 1993 .

[65]  P. Bourdieu SOCIAL SPACE AND SYMBOLIC POWER , 1989 .

[66]  G. ALLEN,et al.  Raw data , 1989, Nature.

[67]  P. Bourdieu The social space and the genesis of groups , 1985 .