Crowdsourcing for translational research: analysis of biomarker expression using cancer microarrays

Background:Academic pathology suffers from an acute and growing lack of workforce resource. This especially impacts on translational elements of clinical trials, which can require detailed analysis of thousands of tissue samples. We tested whether crowdsourcing – enlisting help from the public – is a sufficiently accurate method to score such samples.Methods:We developed a novel online interface to train and test lay participants on cancer detection and immunohistochemistry scoring in tissue microarrays. Lay participants initially performed cancer detection on lung cancer images stained for CD8, and we measured how extending a basic tutorial by annotated example images and feedback-based training affected cancer detection accuracy. We then applied this tutorial to additional cancer types and immunohistochemistry markers – bladder/ki67, lung/EGFR, and oesophageal/CD8 – to establish accuracy compared with experts. Using this optimised tutorial, we then tested lay participants’ accuracy on immunohistochemistry scoring of lung/EGFR and bladder/p53 samples.Results:We observed that for cancer detection, annotated example images and feedback-based training both improved accuracy compared with a basic tutorial only. Using this optimised tutorial, we demonstrate highly accurate (>0.90 area under curve) detection of cancer in samples stained with nuclear, cytoplasmic and membrane cell markers. We also observed high Spearman correlations between lay participants and experts for immunohistochemistry scoring (0.91 (0.78, 0.96) and 0.97 (0.91, 0.99) for lung/EGFR and bladder/p53 samples, respectively).Conclusions:These results establish crowdsourcing as a promising method to screen large data sets for biomarkers in cancer pathology research across a range of cancers and immunohistochemical stains.

[1]  Peng Dai,et al.  Inserting Micro-Breaks into Crowdsourcing Workflows , 2013, HCOMP.

[2]  Andrew H. Beck,et al.  Systematic Analysis of Breast Cancer Morphology Uncovers Stromal Features Associated with Survival , 2011, Science Translational Medicine.

[3]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[4]  J. Giltnane,et al.  Technology Insight: identification of biomarkers with tissue microarray technology , 2004, Nature Clinical Practice Oncology.

[5]  M. Blanchette,et al.  Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment , 2012, PloS one.

[6]  Benjamin M. Good,et al.  Crowdsourcing for bioinformatics , 2013, Bioinform..

[7]  L. Goldstein,et al.  Automated quantitative analysis of estrogen receptor expression in breast carcinoma does not differ from expert pathologist scoring: a tissue microarray study of 3,484 cases , 2008, Breast Cancer Research and Treatment.

[8]  Darren Treanor,et al.  Three‐dimensional reconstruction of ductal carcinoma in situ with virtual slides , 2015, Histopathology.

[9]  Mikael Lundin,et al.  Development and evaluation of a virtual microscopy application for automated assessment of Ki-67 expression in breast cancer , 2011, BMC clinical pathology.

[10]  Sean Davis,et al.  Assessment of Automated Image Analysis of Breast Cancer Tissue Microarrays for Epidemiologic Studies , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[11]  Anne M. Land-Zandstra,et al.  Citizen science on a smartphone: Participants’ motivations and learning , 2016, Public understanding of science.

[12]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[13]  C. Lintott,et al.  Galaxy Zoo: Motivations of Citizen Scientists , 2008, 1303.6886.

[14]  Andrew H. Beck,et al.  Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd. , 2014, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[15]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[16]  J. M. Crawford,et al.  Pathologist workforce in the United States: I. Development of a predictive model to examine factors influencing supply. , 2013, Archives of pathology & laboratory medicine.

[17]  H. Sebastian Seung,et al.  Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Prediction , 2015, NIPS.

[18]  Srinivas C. Turaga,et al.  Space-time wiring specificity supports direction selectivity in the retina , 2014, Nature.

[19]  Bin Liu,et al.  Crowdsourcing the General Public for Large Scale Molecular Pathology Studies in Cancer , 2015, EBioMedicine.

[20]  Zachary F. Meisel,et al.  Crowdsourcing—Harnessing the Masses to Advance Health and Medicine, a Systematic Review , 2013, Journal of General Internal Medicine.

[21]  Les G. Underhill,et al.  Understanding the Motivations and Satisfactions of Volunteers to Improve the Effectiveness of Citizen Science Programs , 2015 .

[22]  E. King,et al.  Tumour-infiltrating lymphocytes predict for outcome in HPV-positive oropharyngeal cancer , 2013, British Journal of Cancer.

[23]  Aaron D. Shaw,et al.  Designing incentives for inexpert human raters , 2011, CSCW.

[24]  H. Sauermann,et al.  Crowd science user contribution patterns and their implications , 2015, Proceedings of the National Academy of Sciences.

[25]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[26]  E B Cox,et al.  Use of a monoclonal anti-estrogen receptor antibody in the immunohistochemical evaluation of human tumors. , 1986, Cancer research.

[27]  Nihar B. Shah,et al.  Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing , 2014, J. Mach. Learn. Res..

[28]  C. Lintott,et al.  Galaxy Zoo: reproducing galaxy morphologies via machine learning★ , 2009, 0908.2033.

[29]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[30]  Manuel Corpas,et al.  Lessons from Fraxinus, a crowd-sourced citizen science game in genomics , 2015, eLife.

[31]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[32]  C. Lintott,et al.  Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey , 2008, 0804.4483.

[33]  D. Wilbur Digital pathology: Get on board—the train is leaving the station , 2014, Cancer cytopathology.

[34]  Peter Donnelly,et al.  Whole-genome sequencing of bladder cancers reveals somatic CDKN1A mutations and clinicopathological associations with mutation burden , 2014, Nature Communications.

[35]  Jennifer Preece,et al.  Dynamic changes in motivation in collaborative citizen-science projects , 2012, CSCW.

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  Charles T. Bargeron,et al.  Lights, Camera…Citizen Science: Assessing the Effectiveness of Smartphone-Based Video Training in Invasive Plant Identification , 2014, PloS one.

[38]  Päivi Heikkilä,et al.  Performance of automated scoring of ER, PR, HER2, CK5/6 and EGFR in breast cancer tissue microarrays in the Breast Cancer Association Consortium , 2014, The journal of pathology. Clinical research.

[39]  Vincent Grégoire,et al.  Digital pathology: elementary, rapid and reliable automated image analysis , 2016, Histopathology.

[40]  Yun-En Liu,et al.  The impact of tutorials on games of varying complexity , 2012, CHI.

[41]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.