Crowdsourcing the General Public for Large Scale Molecular Pathology Studies in Cancer

Background Citizen science, scientific research conducted by non-specialists, has the potential to facilitate biomedical research using available large-scale data, however validating the results is challenging. The Cell Slider is a citizen science project that intends to share images from tumors with the general public, enabling them to score tumor markers independently through an internet-based interface. Methods From October 2012 to June 2014, 98,293 Citizen Scientists accessed the Cell Slider web page and scored 180,172 sub-images derived from images of 12,326 tissue microarray cores labeled for estrogen receptor (ER). We evaluated the accuracy of Citizen Scientist's ER classification, and the association between ER status and prognosis by comparing their test performance against trained pathologists. Findings The area under ROC curve was 0.95 (95% CI 0.94 to 0.96) for cancer cell identification and 0.97 (95% CI 0.96 to 0.97) for ER status. ER positive tumors scored by Citizen Scientists were associated with survival in a similar way to that scored by trained pathologists. Survival probability at 15 years were 0.78 (95% CI 0.76 to 0.80) for ER-positive and 0.72 (95% CI 0.68 to 0.77) for ER-negative tumors based on Citizen Scientists classification. Based on pathologist classification, survival probability was 0.79 (95% CI 0.77 to 0.81) for ER-positive and 0.71 (95% CI 0.67 to 0.74) for ER-negative tumors. The hazard ratio for death was 0.26 (95% CI 0.18 to 0.37) at diagnosis and became greater than one after 6.5 years of follow-up for ER scored by Citizen Scientists, and 0.24 (95% CI 0.18 to 0.33) at diagnosis increasing thereafter to one after 6.7 (95% CI 4.1 to 10.9) years of follow-up for ER scored by pathologists. Interpretation Crowdsourcing of the general public to classify cancer pathology data for research is viable, engages the public and provides accurate ER data. Crowdsourced classification of research data may offer a valid solution to problems of throughput requiring human input.

[1]  E Provenzano,et al.  Astronomical algorithms for automated analysis of tissue protein expression in breast cancer , 2013, British Journal of Cancer.

[2]  J. Kononen,et al.  Tissue microarrays for high-throughput molecular profiling of tumor specimens , 1998, Nature Medicine.

[3]  W. Heath The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies , 2008 .

[4]  D. Rimm,et al.  A decade of tissue microarrays: progress in the discovery and validation of cancer biomarkers. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  F M Blows,et al.  Association between CD8+ T-cell infiltration and breast cancer survival in 12,439 patients. , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[6]  Pénélope Larzillière,et al.  Research in Context , 2010 .

[7]  S. Page Prologue to The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies , 2007 .

[8]  C K Osborne,et al.  Estrogen receptor status by immunohistochemistry is superior to the ligand-binding assay for predicting response to adjuvant endocrine therapy in breast cancer. , 1999, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[9]  Anne-Louise Ponsonby,et al.  A consideration of group work processes in modern epidemiology. , 2014, Annals of epidemiology.

[10]  Päivi Heikkilä,et al.  Performance of automated scoring of ER, PR, HER2, CK5/6 and EGFR in breast cancer tissue microarrays in the Breast Cancer Association Consortium , 2014, The journal of pathology. Clinical research.

[11]  Sean Davis,et al.  Assessment of Automated Image Analysis of Breast Cancer Tissue Microarrays for Epidemiologic Studies , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[12]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[13]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[14]  F M Blows,et al.  Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer , 2011, British Journal of Cancer.

[15]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[16]  Paul D.P. Pharoah,et al.  Commonly studied single-nucleotide polymorphisms and breast cancer: results from the Breast Cancer Association Consortium. , 2007, Journal of the National Cancer Institute.

[17]  Charles T. Bargeron,et al.  Lights, Camera…Citizen Science: Assessing the Effectiveness of Smartphone-Based Video Training in Invasive Plant Identification , 2014, PloS one.

[18]  P. Furmanski,et al.  A rapid and efficient method for testing immunohistochemical reactivity of monoclonal antibodies against multiple tissue samples simultaneously. , 1987, Journal of immunological methods.

[19]  Edith A Perez,et al.  Estrogen- and progesterone-receptor status in ECOG 2197: comparison of immunohistochemistry by local and central laboratories and quantitative reverse transcription polymerase chain reaction by central laboratory. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[20]  Päivi Heikkilä,et al.  Subtyping of Breast Cancer by Immunohistochemistry to Investigate a Relationship between Subtype and Short and Long Term Survival: A Collaborative Analysis of Data for 10,159 Cases from 12 Studies , 2010, PLoS medicine.

[21]  Genica,et al.  Commonly studied single-nucleotide polymorphisms and breast cancer: Results from the Breast Cancer Association Consortium , 2006 .

[22]  J. Giltnane,et al.  Technology Insight: identification of biomarkers with tissue microarray technology , 2004, Nature Clinical Practice Oncology.

[23]  Nicholas J Wareham,et al.  Allelic association of the human homologue of the mouse modifier Ptprj with breast cancer. , 2005, Human molecular genetics.

[24]  C. Lintott,et al.  Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey , 2008, 0804.4483.

[25]  W. Malorni,et al.  Estrogen receptor profiles in human peripheral blood lymphocytes. , 2010, Immunology letters.

[26]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[27]  M. Blanchette,et al.  Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment , 2012, PloS one.