Between Subjectivity and Imposition

The interpretation of data is fundamental to machine learning. This paper investigates practices of image data annotation as performed in industrial contexts. We define data annotation as a sense-making practice, where annotators assign meaning to data through the use of labels. Previous human-centered investigations have largely focused on annotators? subjectivity as a major cause of biased labels. We propose a wider view on this issue: guided by constructivist grounded theory, we conducted several weeks of fieldwork at two annotation companies. We analyzed which structures, power relations, and naturalized impositions shape the interpretation of data. Our results show that the work of annotators is profoundly informed by the interests, values, and priorities of other actors above their station. Arbitrary classifications are vertically imposed on annotators, and through them, on data. This imposition is largely naturalized. Assigning meaning to data is often presented as a technical matter. This paper shows it is, in fact, an exercise of power with multiple implications for individuals and society.

[1]  K. Foot,et al.  Media Technologies: Essays on Communication, Materiality, and Society , 2014 .

[2]  Jeanette Blomberg,et al.  Towards an Anthropology of Services , 2015 .

[3]  Virginia E. Eubanks Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor , 2018 .

[4]  Caitlin Lustig,et al.  How We've Taught Algorithms to See Identity: Constructing Race and Gender in Image Databases for Facial Analysis , 2020, Proc. ACM Hum. Comput. Interact..

[5]  Robert Thornberg Informed Grounded Theory , 2012 .

[6]  T. G.,et al.  Logic in Practice , 1934, Nature.

[7]  Klaus Mueller,et al.  Measuring Social Biases of Crowd Workers using Counterfactual Queries , 2020, ArXiv.

[8]  Susan Leigh Star,et al.  Layers of Silence, Arenas of Voice: The Ecology of Visible and Invisible Work , 1999, Computer Supported Cooperative Work (CSCW).

[9]  Emile Durkheim,et al.  Primitive Classification (Routledge Revivals) , 1963 .

[10]  Michael J. Muller,et al.  How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation , 2019, CHI.

[11]  Pierre Bourdieu,et al.  Outline of a Theory of Practice , 2020, On Violence.

[12]  Solon Barocas,et al.  Problem Formulation and Fairness , 2019, FAT.

[13]  J. Overhage,et al.  Sorting Things Out: Classification and Its Consequences , 2001, Annals of Internal Medicine.

[14]  J. Söderberg Media Technologies - Essays on Communication, Materiality, and Society , 2014 .

[15]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[16]  Alison Bowes,et al.  ETHICAL IMPLICATIONS OF LIFESTYLE MONITORING DATA IN AGEING RESEARCH , 2012 .

[17]  Paul Baker,et al.  ‘Why do white people have thin lips?’ Google and the perpetuation of stereotypes via auto-complete search forms , 2013 .

[18]  Frank A. Pasquale,et al.  [89WashLRev0001] The Scored Society: Due Process for Automated Predictions , 2014 .

[19]  David Nemer,et al.  digitalSTS: A Field Guide for Science & Technology Studies , 2019 .

[20]  Tony Doyle,et al.  Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2017, Inf. Soc..

[21]  A. Korolova,et al.  Discrimination through Optimization , 2019, Proc. ACM Hum. Comput. Interact..

[22]  Emily Denton,et al.  Towards a critical race methodology in algorithmic fairness , 2019, FAT*.

[23]  L. Araujo,et al.  Services, products, and the institutional structure of production , 2006 .

[24]  Michael I. Jordan,et al.  Bayesian Bias Mitigation for Crowdsourcing , 2011, NIPS.

[25]  P. Bourdieu The social space and the genesis of groups , 1985 .

[26]  Frauke Möerike,et al.  Ethnographic Methods for Human Factors Researchers: Collecting and Interweaving Threads of HCI , 2019, CHI 2019.

[27]  Morgan Klaus Scheuerman,et al.  Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems , 2018, CHI.

[28]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[29]  Shion Guha,et al.  Machine Learning and Grounded Theory Method: Convergence, Divergence, and Combination , 2016, GROUP.

[30]  Natalia M. Libakova,et al.  The Method of Expert Interview as an Effective Research Procedure of Studying the Indigenous Peoples of the North , 2015 .

[31]  Joelle Pineau,et al.  Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) , 2020, J. Mach. Learn. Res..

[32]  M. C. Elish,et al.  Situating methods in the magic of Big Data and AI , 2018 .

[33]  Anselm L. Strauss,et al.  Grounded theory : Strategien qualitativer Forschung , 2006 .

[34]  Kathleen H. Pine,et al.  The Politics of Measurement and Action , 2015, CHI.

[35]  Gaëlle Loosli,et al.  Baselines and a datasheet for the Cerema AWP dataset , 2018, ArXiv.

[36]  Justin Cheng,et al.  How annotation styles influence content and preferences , 2013, HT '13.

[37]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[38]  E. Brink,et al.  Constructing grounded theory : A practical guide through qualitative analysis , 2006 .

[39]  Jennifer K. Phillips,et al.  A Data–Frame Theory of Sensemaking , 2007 .

[40]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[41]  D. Fitch,et al.  Review of "Algorithms of oppression: how search engines reinforce racism," by Noble, S. U. (2018). New York, New York: NYU Press. , 2018, CDQR.

[42]  Jeanette Blomberg,et al.  Toward an Anthropology of Services , 2014 .

[43]  Mary L. Gray,et al.  Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass , 2019 .

[44]  Ryan Burns New Frontiers of Philanthro‐capitalism: Digital Technologies and Humanitarianism , 2019, Antipode.

[45]  Jakob Svensson,et al.  The end of media logics? On algorithms and agency , 2018, New Media Soc..

[46]  R. Kitchin,et al.  Thinking critically about and researching algorithms , 2014, The Social Power of Algorithms.

[47]  Stacy E. Lom The Metric Society: On the Quantification of the Social , 2020 .

[48]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[49]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[50]  The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power , 2020 .

[51]  Melanie Feinberg,et al.  A Design Perspective on Data , 2017, CHI.

[52]  Hannah Lebovits Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor , 2018, Public Integrity.

[53]  Astrid Mager Algorithmic Ideology: How Capitalist Society Shapes Search Engines , 2011 .

[54]  Karthik Dinakar,et al.  Studying up: reorienting the study of algorithmic fairness around issues of power , 2020, FAT*.

[55]  James Mussell Raw Data is an Oxymoron , 2014 .

[56]  P. Bourdieu SOCIAL SPACE AND SYMBOLIC POWER , 1989 .

[57]  Tatiana Gavrilyuk,et al.  GENDER REGIMES OF RUSSIAN WORKING-CLASS FAMILIES , 2020 .

[58]  Geoffrey C. Bowker Biodiversity Datadiversity , 2000 .

[59]  Pablo J. Boczkowski,et al.  The Relevance of Algorithms , 2013 .

[60]  Gunay Kazimzade,et al.  Biased Priorities, Biased Outcomes: Three Recommendations for Ethics-oriented Data Annotation Practices , 2020, AIES.

[61]  Kieran Healy,et al.  Classification situations: Life-chances in the neoliberal era , 2013 .

[62]  Michael Muller,et al.  Curiosity, Creativity, and Surprise as Analytic Tools: Grounded Theory Method , 2014, Ways of Knowing in HCI.

[63]  Aasim Khan,et al.  Book review: Shoshana Zuboff, The Age of Surveillance Capitalism: The Fight for Human Future at the New Frontier of Power , 2019, Social Change.

[64]  Joel Young,et al.  Leveraging In-Batch Annotation Bias for Crowdsourced Active Learning , 2015, WSDM.

[65]  Hanna M. Wallach,et al.  Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI , 2020, CHI.

[66]  Angèle Christin From daguerreotypes to algorithms: machines, expertise, and three forms of objectivity , 2016, CSOC.

[67]  Anselm L. Strauss,et al.  Basics of qualitative research : techniques and procedures for developing grounded theory , 1998 .

[68]  Alex Rosenblat,et al.  Networked Employment Discrimination , 2014 .

[69]  Ciaran Cronin,et al.  Bourdieu and Foucault on power and modernity , 1996 .

[70]  R. Stuart Geiger,et al.  Garbage in, garbage out?: do machine learning application papers in social computing report where human-labeled training data comes from? , 2019, FAT*.

[71]  Ahmed Hosny,et al.  The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards , 2018, Data Protection and Privacy.

[72]  Matti Nelimarkka,et al.  Bureaucracy as a Lens for Analyzing and Designing Algorithmic Systems , 2020, CHI.

[73]  Hanna M. Wallach,et al.  A Human-Centered Agenda for Intelligible Machine Learning , 2021 .

[74]  Smitha Milli,et al.  Value-laden disciplinary shifts in machine learning , 2019, FAT*.

[75]  Michael S. Bernstein,et al.  Street-Level Algorithms: A Theory at the Gaps Between Policy and Decisions , 2019, CHI.

[76]  Besnik Fetahu,et al.  Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments , 2019, CHI.

[77]  N. Couldry,et al.  Data Colonialism: Rethinking Big Data’s Relation to the Contemporary Subject , 2018, Television & New Media.

[78]  Eviatar Zerubavel,et al.  The fine line : making distinctions in everyday life , 1993 .

[79]  Steven J. Jackson,et al.  Data Vision: Learning to See Through Algorithmic Abstraction , 2017, CSCW.