Trust in Data Science

The trustworthiness of data science systems in applied and real-world settings emerges from the resolution of specific tensions through situated, pragmatic, and ongoing forms of work. Drawing on research in CSCW, critical data studies, and history and sociology of science, and six months of immersive ethnographic fieldwork with a corporate data science team, we describe four common tensions in applied data science work: (un)equivocal numbers, (counter)intuitive knowledge, (in)credible data, and (in)scrutable models. We show how organizational actors establish and re-negotiate trust under messy and uncertain analytic conditions through practices of skepticism, assessment, and credibility. Highlighting the collaborative and heterogeneous nature of real-world data science, we show how the management of trust in applied corporate data science settings depends not only on pre-processing and quantification, but also on negotiation and translation. We conclude by discussing the implications of our findings for data science research and practice, both within and beyond CSCW.

[1]  Sabina Leonelli,et al.  What Counts as Scientific Data? A Relational Framework , 2015, Philosophy of Science.

[2]  Joseph O'connell,et al.  Metrology: The Creation of Universality by the Circulation of Particulars , 1993 .

[3]  P. Guttorp,et al.  The Taming of Chance. , 1992 .

[4]  Gernot Rieder,et al.  Datatrust: Or, the political quest for numerical evidence and the epistemologies of Big Data , 2016, Big Data Soc..

[5]  K. Foot,et al.  Media Technologies: Essays on Communication, Materiality, and Society , 2014 .

[6]  Lois Quam,et al.  The Audit Society: Rituals of Verification , 1998 .

[7]  A. Strauss,et al.  The discovery of grounded theory: strategies for qualitative research aldine de gruyter , 1968 .

[8]  Zachary C. Lipton,et al.  The mythos of model interpretability , 2018, Commun. ACM.

[9]  Kevin Carillo,et al.  Let's stop trying to be "sexy" - preparing managers for the (big) data-driven business era , 2017, Bus. Process. Manag. J..

[10]  Frank A. Pasquale The Black Box Society: The Secret Algorithms That Control Money and Information , 2015 .

[11]  S. Shapin Cordelia’s Love: Credibility and the Social Studies of Science , 1995, Perspectives on Science.

[12]  Harvey V. Fineberg,et al.  Trust, Honesty, and the Authority of Science , 1995 .

[13]  Helen Kennedy,et al.  Known or knowing publics? Social media data mining and the question of public agency , 2015, Big Data Soc..

[14]  T. Porter,et al.  Trust in Numbers , 2020 .

[15]  Mark Rouncefield,et al.  Trustworthy by design , 2014, CSCW.

[16]  John Dewey,et al.  Theory of valuation , 1939 .

[17]  Eric Ps Baumer,et al.  Toward human-centered algorithm design , 2017 .

[18]  Evelyn Fox Kellertt Models Of and Models For: Theory and Practice in Contemporary Biology , 2000 .

[19]  A. Desrosières,et al.  The Politics of Large Numbers: A History of Statistical Reasoning , 1999 .

[20]  J. Edwards,et al.  Rethinking Expertise , 2008 .

[21]  Karen Ruhleder,et al.  Steps towards an ecology of infrastructure: complex problems in design and access for large-scale collaborative systems , 1994, CSCW '94.

[22]  Carlos Guestrin,et al.  Model-Agnostic Interpretability of Machine Learning , 2016, ArXiv.

[23]  Solon Barocas,et al.  The Intuitive Appeal of Explainable Machines , 2018 .

[24]  Kyungsik Han,et al.  Empirical Analysis of the Subjective Impressions and Objective Measures of Domain Scientists' Visual Analytic Judgments , 2017, CHI.

[25]  Paul Dourish,et al.  Algorithms and their others: Algorithmic culture in context , 2016, Big Data Soc..

[26]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[27]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[28]  Jun Zhao,et al.  'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions , 2018, CHI.

[29]  Cory P. Knobel Ontic Occlusion and Exposure in Sociotechnical Systems , 2010 .

[30]  C. L. Philip Chen,et al.  Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery , 2011, Int. J. Mach. Learn. Cybern..

[31]  H. Garfinkel Studies of the Routine Grounds of Everyday Activities , 1964 .

[32]  Helen Nissenbaum,et al.  Bias in computer systems , 1996, TOIS.

[33]  Alan Rubel,et al.  Student privacy in learning analytics: An information ethics perspective , 2014, Inf. Soc..

[34]  L. Gitelman "Raw Data" Is an Oxymoron , 2013 .

[35]  Albrecht Schmidt,et al.  Increasing Users' Confidence in Uncertain Data by Aggregating Data from Multiple Sources , 2017, CHI.

[36]  Peter D Toon,et al.  Society's Choices — Social and Ethical Decision Making in Biomedicine , 1997 .

[37]  Rob Kitchin,et al.  What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets , 2016, Big Data Soc..

[38]  François Thoreau,et al.  ‘A mechanistic interpretation, if possible’: How does predictive modelling causality affect the regulation of chemicals? , 2016, Big Data Soc..

[39]  Vladik Kreinovich,et al.  The End of Theory? Does the Data Deluge Make the Scientific Method Obsolete? , 2008 .

[40]  P. W. Hunter,et al.  The politics of large numbers. A history of statistical reasoning , 2006 .

[41]  Anselm L. Strauss,et al.  Strauss, Anselm, and Juliet Corbin. Basics of Qualitative Research: Grounded Theory Procedures and Techniques. Newbury Park,CA: Sage, 1990. , 1990 .

[42]  Carlos Guestrin,et al.  Programs as Black-Box Explanations , 2016, ArXiv.

[43]  John Zimmerman,et al.  Investigating How Experienced UX Designers Effectively Work with Machine Learning , 2018, Conference on Designing Interactive Systems.

[44]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[45]  Nadine Schuurman,et al.  alt.metadata.health: Ontological Context for Data Use and Integration , 2009, Computer Supported Cooperative Work (CSCW).

[46]  Ryan Calo,et al.  There is a blind spot in AI research , 2016, Nature.

[47]  Mitchell L. Stevens,et al.  A Sociology of Quantification* , 2008, European Journal of Sociology.

[48]  Adrian Mackenzie,et al.  Machine Learners: Archaeology of a Data Practice , 2017 .

[49]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[50]  Anselm L. Strauss,et al.  Basics of qualitative research : techniques and procedures for developing grounded theory , 1998 .

[51]  Michele Willson,et al.  Algorithms (and the) everyday , 2017, The Social Power of Algorithms.

[52]  L. Daston,et al.  The Image of Objectivity , 1992 .

[53]  Michael Veale Logics and practices of transparency and opacity in real-world applications of public sector machine learning , 2017, ArXiv.

[54]  Tal Z. Zarsky,et al.  The Trouble with Algorithmic Decisions , 2016 .

[55]  David Stuart,et al.  The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences , 2015, Online Inf. Rev..

[56]  David Sweeney,et al.  Data and life on the street , 2014 .

[57]  H. Garfinkel Studies in Ethnomethodology , 1968 .

[58]  Gina Neff,et al.  Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science , 2017, Big Data.

[59]  D. Sculley,et al.  What’s your ML test score? A rubric for ML production systems , 2016 .

[60]  Rob Kitchin,et al.  The data revolution : big data, open data, data infrastructures & their consequences , 2014 .

[61]  Gauri Naik,et al.  Will the future of knowledge work automation transform personalized medicine? , 2014, Applied & translational genomics.

[62]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[63]  S. Cole,et al.  : A Social History of Truth: Civility and Science in Seventeenth-Century England , 1996 .

[64]  Steven J. Jackson,et al.  Data Vision: Learning to See Through Algorithmic Abstraction , 2017, CSCW.

[65]  G. Box Robustness in the Strategy of Scientific Model Building. , 1979 .

[66]  Richard Harper The social organization of the IMF’s mission work : An examination of international auditing , 2003 .

[67]  Brad A. Myers,et al.  Variolite: Supporting Exploratory Programming by Data Scientists , 2017, CHI.

[68]  Pablo J. Boczkowski,et al.  The Relevance of Algorithms , 2013 .

[69]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[70]  Ingrid Creppell On Justification: Economies of Worth , 2007, Perspectives on Politics.

[71]  Ruben Amarasingham,et al.  The legal and ethical concerns that arise from using complex predictive analytics in health care. , 2014, Health affairs.

[72]  John Symons,et al.  Can we trust Big Data? Applying philosophy of science to software , 2016, Big Data Soc..

[73]  Solon Barocas,et al.  Ten simple rules for responsible big data research , 2017, PLoS Comput. Biol..

[74]  Shwetak N. Patel,et al.  How Good is 85%?: A Survey Tool to Connect Classifier Evaluation to Acceptability of Accuracy , 2015, CHI.

[75]  Jennifer Pierre,et al.  The conundrum of police officer-involved homicides: Counter-data in Los Angeles County , 2016, Big Data Soc..

[76]  Mireille Hildebrandt,et al.  Who Needs Stories if You Can Get the Data? ISPs in the Era of Big Number Crunching , 2011 .

[77]  Ina Wagner,et al.  Making things work: dimensions of configurability as appropriation work , 2006, CSCW '06.

[78]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[79]  S. Shapin The Invisible Technician , 1989 .

[80]  Mariarosaria Taddeo,et al.  The ethics of algorithms: Mapping the debate , 2016, Big Data Soc..

[81]  James Mussell Raw Data is an Oxymoron , 2014 .

[82]  B. Asher The Professional Vision , 1994 .

[83]  Luciano Floridi,et al.  The Ethics of Big Data: Current and Foreseeable Issues in Biomedical Contexts , 2015, Science and Engineering Ethics.

[84]  K. Crawford,et al.  Where are human subjects in Big Data research? The emerging ethics divide , 2016, Big Data Soc..

[85]  J. L. Heilbron,et al.  Leviathan and the air-pump. Hobbes, Boyle, and the experimental life , 1989, Medical History.

[86]  B. Latour Science in Action , 1987 .

[87]  Engin Bozdag,et al.  Bias in algorithmic filtering and personalization , 2013, Ethics and Information Technology.

[88]  Clayton J. Hutto,et al.  Developing a Research Agenda for Human-Centered Data Science , 2016, CSCW Companion.

[89]  Steven Jackson Water models and water politics: design, deliberation, and virtual accountability , 2006, DG.O.

[90]  R. Kitchin,et al.  Big Data, new epistemologies and paradigm shifts , 2014, Big Data Soc..

[91]  Ding Wang,et al.  Models and Patterns of Trust , 2015, CSCW.

[92]  Mike Ananny,et al.  Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability , 2018, New Media Soc..

[93]  W. Orlikowski Sociomaterial Practices: Exploring Technology at Work , 2007 .

[94]  Jennifer Gabrys,et al.  Just good enough data: Figuring data citizenships through air pollution sensing and data stories , 2016, Big Data Soc..

[95]  Jenna Burrell,et al.  How the machine ‘thinks’: Understanding opacity in machine learning algorithms , 2016 .

[96]  Bernward Joerges,et al.  A Fresh Look at Instrumentation an Introduction , 2001 .

[97]  Elena Paslaru Bontas Simperl,et al.  The Trials and Tribulations of Working with Structured Data: -a Study on Information Seeking Behaviour , 2017, CHI.

[98]  Sabina Leonelli,et al.  What difference does quantity make? On the epistemology of Big Data in biology , 2014, Big Data Soc..

[99]  David Stark,et al.  The Sense of Dissonance: Accounts of Worth in Economic Life , 2009 .

[100]  Geoffrey C. Bowker The Theory/Data Thing Commentary , 2014 .

[101]  Jana Diesner,et al.  Small decisions with big impact on data analytics , 2015 .

[102]  Jon Kleinberg,et al.  Algorithms Need Managers, Too , 2016 .

[103]  Jürgen Habermas,et al.  Autonomy and Solidarity Interviews , 1986 .