Taking Big Data apart: local readings of composite media collections

ABSTRACT If we are to think critically about Big Data initiatives, we must learn to take them apart. This paper explains how to interrogate Big Data, not as large homogenous resources, but as heterogeneous collections with varied and discordant local ties. My argument focuses on the Big Data of media collections: composite digital repositories of texts, images, and video created in different contexts, but brought together online. The primary example used in this paper is the Digital Public Library of America (DPLA), a collection composed of digitized library, museum and archive records from institutions across the United States. I demonstrate how local readings of DPLA data can uncover schemata, errors, infrastructures, classifications, absences, and rituals that have important origins. Moreover, I explain how identifying these local features can support new forms of scholarship, pedagogy, and advocacy in the face of Big Data. For this last point, I use two additional cases: NewsScape, a real-time archive of broadcast news, and Zillow, a marketplace for real estate listings. The range of examples demonstrates how the stakes change from one Big Data initiative to the next. The paper concludes with a set of speculative guidelines for attending to the local conditions in Big Data: get dirty, take a comparative approach, show context, use data to connect people, and create opportunities for the collection of counter-data. When working with Big Data, I argue that thinking locally is thinking critically.

[1]  James Austin Laboratory life: The social construction of scientific facts: by Bruno Latour and Steve Woolgar. Sage, Beverly Hills, CA, 1979. , 1982 .

[2]  Ben Shneiderman,et al.  The big picture for big data: visualization. , 2014, Science.

[3]  Rob Kitchin,et al.  Towards Critical Data Studies: Charting and Unpacking Data Assemblages and Their Work , 2014 .

[4]  K. Cukier,et al.  The Rise of Big Data , 2013 .

[5]  P. N. Edwards A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming , 2010 .

[6]  James Mussell Raw Data is an Oxymoron , 2014 .

[7]  Johanna Drucker,et al.  Graphesis: Visual Forms of Knowledge Production , 2014 .

[8]  M. Douglas,et al.  Purity and Danger: An Analysis of Concepts of Pollution and Taboo. , 1967 .

[9]  Rob Kitchin,et al.  What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets , 2016, Big Data Soc..

[10]  Paul Dourish,et al.  The value of data: considering the context of production in data economies , 2011, CSCW.

[11]  Jonathan Culler The Closeness of Close Reading , 2010 .

[12]  Wiebe E. Bijker,et al.  Science in action : how to follow scientists and engineers through society , 1989 .

[13]  Donna Harawy Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective , 2022, Philosophical Literary Journal Logos.

[14]  J. Overhage,et al.  Sorting Things Out: Classification and Its Consequences , 2001, Annals of Internal Medicine.

[15]  Benjamin Fry,et al.  Visualizing data - exploring and explaining data with the processing environment , 2008 .

[16]  Clem Guthro Digital Public Library of America , 2013 .

[17]  Russell Olwell,et al.  Nuclear Rites: A Weapons Laboratory at the End of the Cold War by Hugh Gusterson (review) , 1997, Technology and Culture.

[18]  T. Gieryn,et al.  A Space for Place in Sociology , 2000 .

[19]  Greta Franzini,et al.  On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges , 2015, EuroVis.

[20]  David E. Nye,et al.  American Technological Sublime , 1995, IEEE Technology and Society Magazine.

[21]  Ann Zimmerman,et al.  New Knowledge from Old Data , 2008 .

[22]  A. Telier,et al.  Drawing things together , 2012, INTR.

[23]  Susan Leigh Star,et al.  Institutional Ecology, `Translations' and Boundary Objects: Amateurs and Professionals in Berkeley's Museum of Vertebrate Zoology, 1907-39 , 1989 .

[24]  Ck Cheng,et al.  The Age of Big Data , 2015 .

[25]  S. Shapin Laboratory life. The social construction of scientific facts , 1981, Medical History.

[26]  Jonathan Culler,et al.  The Closeness of Close Reading , 2010 .

[27]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[28]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[29]  Peter van Ham,et al.  Interview with the Author , 2001 .