Big Data, Big Questions| A Dozen Ways to Get Lost in Translation: Inherent Challenges in Large Scale Data Sets

As noted by the late Susan Leigh Star, technoscientific research always involves simplification and standardization. In recent years, the collection and analysis of large-scale data sets (LSDS) have become the norm. These are often convenience samples analyzed by data mining techniques. Moreover, these data are often used as the basis for public and private policy and action. At the same time, the term “large-scale” suggests completeness, while ease of collection and analysis suggest that little else need be done. Both tend to crowd out other interpretations; hence understanding their limits should be of the utmost concern. This article discusses a number of the issues of concern that arise out of the necessary but potentially problematic simplifications/standardizations found in LSDS.

[1]  E. Durkheim Suicide: A Study in Sociology , 1897 .

[2]  Thomas A Louis,et al.  Why Statistics? , 2012, Science.

[3]  Lawrence Busch How animal welfare standards create and justify realities , 2011 .

[4]  Graham D. Burchell,et al.  The birth of biopolitics : lectures at the Collège de France, 1978-79 , 2010 .

[5]  Sandra Harding,et al.  Two Influential Theories of Ignorance and Philosophy's Interests in Ignoring Them , 2006, Hypatia.

[6]  B. Strasser Data-driven sciences: From wonder cabinets to electronic databases. , 2012, Studies in history and philosophy of biological and biomedical sciences.

[7]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[8]  R. Romanyshyn,et al.  Technology as symptom and dream , 1989 .

[9]  J. Matthews Trust in numbers: The pursuit of objectivity in science and public life , 1996 .

[10]  Graham D. Burchell,et al.  Security, Territory, Population: Lectures at the College de France 1977-1978 , 2007 .

[11]  Theodore M. Porter,et al.  Thin Description: Surface and Depth in Science and Science Studies , 2012, Osiris.

[12]  W. Dean,et al.  Competing Conceptions of Risk , 1996 .

[13]  Marcello Boldrini,et al.  Scientific truth and statistical method , 1972 .

[14]  Jocelyn Kaiser,et al.  Clinical medicine. Biomarker tests need closer scrutiny, IOM concludes. , 2012, Science.

[15]  John Richards,et al.  Rethinking the Economy , 1986 .

[16]  G. Lloyd,et al.  Saving the Appearances , 1978, The Classical Quarterly.

[17]  Dino Pedreschi,et al.  Big Data Mining, Fairness and Privacy , 2011 .

[18]  S. L. Star,et al.  The Ethnography of Infrastructure , 1999 .

[19]  Don Ihde Technics and praxis , 1978 .

[20]  H Roberts,et al.  Risk Society: Towards a New Modernity , 1994 .

[21]  Vincent I. West Analytical Tools for Studying Demand and Price Structures, Agriculture Handbook No. 146, Richard J. Foote, Washington: U. S. Government Printing Office, 1958. Pp. 217. $1.00 , 1959 .

[22]  H. Harbers,et al.  The body multiple , 2005 .

[23]  Samuel Y. Edgerton Brunelleschi's mirror, Alberti's window, and Galileo's 'perspective tube' , 2006 .

[24]  Martha A. Poon From New Deal Institutions to Capital Markets: Commercial Consumer Risk Scores and the Making of Subprime Mortgage Finance , 2009 .

[25]  Susan Leigh Star,et al.  Simplification in Scientific Work: An Example from Neuroscience Research , 1983 .

[26]  Lawrence Busch,et al.  Climate change: how debates over standards shape the biophysical, social, political and economic climate. , 2011 .

[27]  K. Knorr-Cetina,et al.  Epistemic cultures : how the sciences make knowledge , 1999 .

[28]  R. V. Schomberg Towards Responsible Research and Innovation in the Information and Communication Technologies and Security Technologies Fields , 2011 .

[29]  D. Lyon Surveillance as social sorting : privacy, risk, and digital discrimination , 2003 .

[30]  Jennifer Phillips,et al.  Communication and Mental Processes: Experimental and Analytic Processing of Uncertain Climate Information , 2007 .

[31]  Lawrence Busch,et al.  Standards: Recipes for Reality , 2011 .

[32]  Daniel MacArthur,et al.  Methods: Face up to false positives , 2012, Nature.

[33]  B. Latour Pandora's Hope: Essays on the Reality of Science Studies , 1999 .

[34]  Suraje Dessai,et al.  Unstable climates: Exploring the statistical and social constructions of ‘normal’ climate , 2009 .

[35]  Ruth McNally,et al.  Classifying, Constructing, and Identifying Life , 2013 .

[36]  Luciano Floridi,et al.  Big Data and Their Epistemological Challenge , 2012 .

[37]  John Downer “737-Cabriolet”: The Limits of Knowledge and the Sociology of Inevitable Failure1 , 2011, American Journal of Sociology.

[38]  Annemarie Mol,et al.  Layers or versions? Human bodies and the love of bitterness , 2012 .

[39]  J. Mervis U.S. science policy. Agencies rally to tackle big data. , 2012, Science.

[40]  David Lorge Parnas,et al.  Evaluation of safety-critical software , 1990, CACM.

[41]  J. Keynes,et al.  The League of Nations Professor Tinbergen’s Method , 1939 .

[42]  Simon Head,et al.  The grim threat to British universities , 2011 .

[43]  R. Filho,et al.  Security, territory, population (Lectures at the College de France) , 2011 .

[44]  W. Freudenburg Environmental Degradation, Disproportionality, and the Double Diversion: Reaching Out, Reaching Ahead, and Reaching Beyond , 2006 .

[45]  Lois Quam,et al.  The Audit Society: Rituals of Verification , 1998 .

[46]  Perry E. Cabot,et al.  Disproportionality as a Framework for Linking Social and Biophysical Systems , 2006 .