Managing the Public to Manage Data: Citizen Science and Astronomy

Citizen Cyberscience Projects (CCPs) that recruit members of the public as volunteers to process and produce large datasets promise a great deal of benefits to scientists and science. However, if this promise is to be realised, and citizen science-produced datasets are to be widely used by scientists, it is essential that these datasets win the trust of the scientific community. This task of securing credibility involves, in part, applying standard scientific procedures to clean up datasets formed by volunteer contributions. However, the management of volunteers’ behaviour in terms of how they contribute also plays a significant role in improving both the quality of individual contributions and the overall robustness of the resultant datasets. This can assist CCPs in securing a reputation for producing trustworthy datasets. Through a case study of Galaxy Zoo, a CCP set up to generate datasets based on volunteer classifications of galaxy morphologies, this paper explores how those involved in running the project manage volunteers. In particular, it focuses on how methods for crediting volunteer contributions motivate volunteers to provide higher quality contributions and to behave in a way that better corresponds to statistical assumptions made when combining volunteer contributions into datasets. These methods have made a significant contribution to the success of the project in securing trust in these datasets, which have been well used by other scientists. Implications for practice are then presented for CCPs, providing a list of considerations to guide choices regarding how to credit volunteer contributions to improve the quality and trustworthiness of citizen science-produced datasets.

[1]  P. N. Edwards,et al.  Knowledge Infrastructures: Intellectual Frameworks and Research Challenges , 2013 .

[2]  Alexander S. Szalay,et al.  The Sloan Digital Sky Survey , 1999, Comput. Sci. Eng..

[3]  Peter T. Darch,et al.  When scientists meet the public: an investigation into citizen cyberscience , 2011 .

[4]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[5]  Matthew S. Mayernik,et al.  Whose data do you trust? Integrity issues in the preservation of scientific data , 2008 .

[6]  A. Carusi,et al.  Retaining volunteers in volunteer computing projects , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[7]  C. Lintott,et al.  Galaxy Zoo: reproducing galaxy morphologies via machine learning★ , 2009, 0908.2033.

[8]  A. Strauss,et al.  The discovery of grounded theory: strategies for qualitative research aldine de gruyter , 1968 .

[9]  N. House Digital libraries and practices of trust: Networked biodiversity information , 2002 .

[10]  Eric Horvitz,et al.  Incentives for truthful reporting in crowdsourcing , 2012, AAMAS.

[11]  Jeremy P. Birnholtz,et al.  Data at work: supporting sharing in science and engineering , 2003, GROUP.

[12]  Jane Hunter,et al.  Assessing the quality and trustworthiness of citizen science data , 2013, Concurr. Comput. Pract. Exp..

[13]  E. Berger,et al.  Ensuring the integrity, accessibility, and stewardship of research data in the digital age , 2009 .

[14]  Viola Krebs Motivations of Cybervolunteers in an Applied Distributed Computing Environment: MalariaControl.net as an Example , 2010, First Monday.

[15]  C. Lintott,et al.  Galaxy Zoo 1: data release of morphological classifications for nearly 900 000 galaxies , 2010, 1007.3265.

[16]  Ian J. Taylor,et al.  Peer-To-Peer Techniques for Data Distribution in Desktop Grid Computing Platforms , 2007, CoreGRID Workshop - Making Grids Work.

[17]  Oded Nov,et al.  Technology-Mediated Citizen Science Participation: A Motivational Model , 2011, ICWSM.

[18]  Alexander S. Szalay,et al.  Digital Data Preservation for Scholarly Publications in Astronomy , 2008, Int. J. Digit. Curation.

[19]  C. Lintott,et al.  Galaxy Zoo: 'Hanny's Voorwerp', a quasar light echo? , 2009, 0906.5304.

[20]  Ann Zimmerman,et al.  New Knowledge from Old Data , 2008 .

[21]  Christine L. Borgman,et al.  Follow the data: How astronomers use and reuse data , 2012, ASIST.

[22]  Anurag Garg,et al.  Collaboration Online: The Example of Distributed Computing , 2005, J. Comput. Mediat. Commun..

[23]  David P. Anderson,et al.  Performance Evaluation of Scheduling Policies for Volunteer Computing , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[24]  Jordan Raddick,et al.  Galaxy Zoo: Morphological Classification and Citizen Science , 2011, 1104.5513.

[25]  Christine L. Borgman,et al.  Data, data use, and scientific inquiry: two case studies of data practices , 2012, JCDL '12.

[26]  Eric J. Korpela,et al.  SETI@home, BOINC, and Volunteer Distributed Computing , 2012 .

[27]  David Gavaghan,et al.  Post-genomic science: cross-disciplinary and large-scale collaborative research and its organizational and technological challenges for the scientific research process , 2006, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[28]  J. D. Whyatt,et al.  How Reliable are Citizen‐Derived Scientific Data? Assessing the Quality of Contrail Observations Made by the General Public , 2013, Trans. GIS.

[29]  C. Potter,et al.  Citizen science as seen by scientists: Methodological, epistemological and ethical dimensions , 2014, Public understanding of science.

[30]  Ixchel M. Faniel,et al.  Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues’ Data , 2010, Computer Supported Cooperative Work (CSCW).

[31]  D. Kleppner Ensuring the integrity, accessibility, and stewardship of research data in the digital age , 2010 .

[32]  C. Lintott,et al.  Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers. , 2009, 0909.2925.

[33]  Nithya Ramanathan,et al.  Know Thy Sensor: Trust, Data Quality, and Data Integrity in Scientific Digital Libraries , 2007, ECDL.