Handling uncertainty in citizen science data: Towards an improved amateur-based large-scale classification

Abstract Citizen Science, traditionally known as the engagement of amateur participants in research, is showing great potential for large-scale processing of data. In areas such as astronomy, biology, or geo-sciences, where emerging technologies generate huge volumes of data, Citizen Science projects enable image classification at a rate not possible to accomplish by experts alone. However, this approach entails the spread of biases and uncertainty in the results, since participants involved are typically non-experts in the problem and hold variable skills. Consequently, the research community tends not to trust Citizen Science outcomes, claiming a generalised lack of accuracy and validation. We introduce a novel multi-stage approach to handle uncertainty within data labelled by amateurs in Citizen Science projects. Firstly, our method proposes a set of transformations that leverage the uncertainty in amateur classifications. Then, a hybridisation strategy provides the best aggregation of the transformed data for improving the quality and confidence in the results. As a case study, we consider the Galaxy Zoo, a project pursuing the labelling of galaxy images. A limited set of expert classifications allow us to validate the experiments, confirming that our approach is able to greatly boost accuracy and classify more images with respect to the state-of-art.

[1]  Lior Shamir,et al.  Combining Human and Machine Learning for Morphological Analysis of Galaxy Images , 2014, ArXiv.

[2]  R. Bonney,et al.  Citizen Science: A Developing Tool for Expanding Science Knowledge and Scientific Literacy , 2009 .

[3]  Eric Horvitz,et al.  Combining human and machine intelligence in large-scale crowdsourcing , 2012, AAMAS.

[4]  John L. Tonry,et al.  A transient search using combined human and machine classifications , 2017, 1707.05223.

[5]  Carsten S. Østerlund,et al.  Blending Machine and Human Learning Processes , 2017, HICSS.

[6]  C. Lintott,et al.  Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey , 2008, 0804.4483.

[7]  M. Fedrizzi,et al.  Fuzzy Logic Approaches to Consensus Modelling in Group Decision Making , 2008 .

[8]  Robert J. Houghton,et al.  Task Workflow Design and its impact on performance and volunteers' subjective preference in Virtual Citizen Science , 2017, Int. J. Hum. Comput. Stud..

[9]  M. Huertas-Company Galaxy morphologies in the era of big-data surveys , 2015, Proceedings of the International Astronomical Union.

[10]  Robert J. Erhardt,et al.  The spatial distribution of African savannah herbivores: species associations and habitat occupancy in a landscape context , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[11]  Manuel Jimenez,et al.  A First Approach for Handling Uncertainty in Citizen Science , 2018, 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[12]  Ronald R. Yager,et al.  OWA aggregation of multi-criteria with mixed uncertain satisfactions , 2017, Inf. Sci..

[13]  David N. Bonter,et al.  Citizen Science as an Ecological Research Tool: Challenges and Benefits , 2010 .

[14]  Jun Yu,et al.  Modeling Experts and Novices in Citizen Science Data for Species Distribution Modeling , 2010, 2010 IEEE International Conference on Data Mining.

[15]  C. Lintott,et al.  PLANET HUNTERS: ASSESSING THE KEPLER INVENTORY OF SHORT-PERIOD PLANETS , 2012, 1205.6769.

[16]  Charles Bouveyron,et al.  Robust supervised classification with mixture models: Learning from data with uncertain labels , 2009, Pattern Recognit..

[17]  Rick Bonney,et al.  The history of public participation in ecological research , 2012 .

[18]  Caren B. Cooper,et al.  Data validation in citizen science: a case study from Project FeederWatch , 2012 .

[19]  Chris J. Lintott,et al.  The Solar Stormwatch CME catalogue: Results from the first space weather citizen science project , 2014 .

[20]  David De Roure,et al.  Zooniverse: observing the world's largest citizen science platform , 2014, WWW.

[21]  Daren C. Brabham Crowdsourcing as a Model for Problem Solving , 2008 .

[22]  Robert J. Houghton,et al.  Comparing experts and novices in Martian surface feature change detection and identification , 2018, International Journal of Applied Earth Observation and Geoinformation.

[23]  Lior Shamir,et al.  Classification of large acoustic datasets using machine learning and crowdsourcing: application to whale calls. , 2014, The Journal of the Acoustical Society of America.

[24]  Sarah Kendrew,et al.  THE MILKY WAY PROJECT: LEVERAGING CITIZEN SCIENCE AND MACHINE LEARNING TO DETECT INTERSTELLAR BUBBLES , 2014, 1406.2692.

[25]  Alexander S. Szalay,et al.  Galaxy Zoo: the dependence of morphology and colour on environment , 2008, 0805.2612.

[26]  G. Klir Where do we stand on measures of uncertainty, ambiguity, fuzziness, and the like? , 1987 .

[27]  Jonathan M. Garibaldi,et al.  Uncertain Fuzzy Reasoning: A Case Study in Modelling Expert Decision Making , 2007, IEEE Transactions on Fuzzy Systems.

[28]  R. Bonney,et al.  Next Steps for Citizen Science , 2014, Science.

[29]  Francisco Herrera,et al.  Some induced ordered weighted averaging operators and their use for solving group decision-making problems based on fuzzy preference relations , 2007, Eur. J. Oper. Res..

[30]  K. Schawinski,et al.  Observational evidence for AGN feedback in early-type galaxies , 2007, 0709.3015.

[31]  Bin Liu,et al.  Crowdsourcing the General Public for Large Scale Molecular Pathology Studies in Cancer , 2015, EBioMedicine.

[32]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[33]  Michael J. Longo Detection of a dipole in the handedness of spiral galaxies with redshifts z ∼ 0.04 , 2011 .

[34]  A. Katsaggelos,et al.  Gravity Spy: integrating advanced LIGO detector characterization, machine learning, and citizen science , 2016, Classical and quantum gravity.

[35]  Ivor W. Tsang,et al.  Co-labeling: A New Multi-view Learning Approach for Ambiguous Problems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[36]  Xinwang Liu,et al.  An interval type-2 fuzzy TOPSIS model for large scale group decision making problems with social network information , 2018, Inf. Sci..

[37]  Tomas J. Bird,et al.  Statistical solutions for error and bias in global citizen science datasets , 2014 .

[38]  Christopher C. Hennon,et al.  Cyclone Center: Can Citizen Scientists Improve Tropical Cyclone Intensity Records? , 2015 .

[39]  C. Lintott,et al.  Galaxy Zoo 1: data release of morphological classifications for nearly 900 000 galaxies , 2010, 1007.3265.

[40]  J. Silvertown A new dawn for citizen science. , 2009, Trends in ecology & evolution.

[41]  Veselka Boeva,et al.  Multi-step ranking of alternatives in a multi-criteria and multi-expert decision making environment , 2006, Inf. Sci..

[42]  N. U. Mayall Extra-Galactic Nebulae , 1962 .

[43]  Jordan Raddick,et al.  Galaxy Zoo: Morphological Classification and Citizen Science , 2011, 1104.5513.

[44]  Canada.,et al.  Data Mining and Machine Learning in Astronomy , 2009, 0906.2173.

[45]  Margaret Kosmala,et al.  Assessing data quality in citizen science (preprint) , 2016, bioRxiv.

[46]  R. Grayson A Life in the Trenches? The Use of Operation War Diary and Crowdsourcing Methods to Provide an Understanding of the British Army’s Day-to-Day Life on the Western Front , 2016 .

[47]  Sander Dieleman,et al.  Rotation-invariant convolutional neural networks for galaxy morphology prediction , 2015, ArXiv.

[48]  J. Cohn Citizen Science: Can Volunteers Do Real Research? , 2008 .

[49]  C. Lintott,et al.  Galaxy Zoo: reproducing galaxy morphologies via machine learning★ , 2009, 0908.2033.