Phenotate: crowdsourcing phenotype annotations as exercises in undergraduate classes

Purpose Computational documentation of genetic disorders is highly reliant on structured data for differential diagnosis, pathogenic variant identification, and patient matchmaking. However, most information on rare diseases (RDs) exists in freeform text, such as academic literature. To increase availability of structured RD data, we developed a crowdsourcing approach for collecting phenotype information using student assignments. Methods We developed Phenotate, a web application for crowdsourcing disease phenotype annotations through assignments for undergraduate genetics students. Using student-collected data, we generated composite annotations for each disease through a machine learning approach. These annotations were compared with those from clinical practitioners and gold standard curated data. Results Deploying Phenotate in five undergraduate genetics courses, we collected annotations for 22 diseases. Student-sourced annotations showed strong similarity to gold standards, with F-measures ranging from 0.584 to 0.868. Furthermore, clinicians used Phenotate annotations to identify diseases with comparable accuracy to other annotation sources and gold standards. For six disorders, no gold standards were available, allowing us to create some of the first structured annotations for them, while students demonstrated ability to research RDs. Conclusion Phenotate enables crowdsourcing RD phenotypic annotations through educational assignments. Presented as an intuitive web-based tool, it offers pedagogical benefits and augments the computable RD knowledgebase.

[1]  A. Engel Congenital Myasthenic Syndromes , 1985, Journal of child neurology.

[2]  Michael Brudno,et al.  PhenoTips: Patient Phenotyping Software for Clinical and Research Use , 2013, Human mutation.

[3]  Damian Smedley,et al.  Next-generation diagnostics and disease-gene discovery with the Exomiser , 2015, Nature Protocols.

[4]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[5]  Ashley N. D. Meyer,et al.  Crowdsourcing Diagnosis for Patients With Undiagnosed Illnesses: An Evaluation of CrowdMed , 2016, Journal of medical Internet research.

[6]  M. Brudno,et al.  Prioritizing Clinically Relevant Copy Number Variation from Genetic Interactions and Gene Function Data , 2015, PloS one.

[7]  R E Pyeritz,et al.  Revised diagnostic criteria for the Marfan syndrome. , 1996, American journal of medical genetics.

[8]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[9]  Globalizing and crowdsourcing biomedical research. , 2016, British medical bulletin.

[10]  Peter N. Robinson,et al.  Harmonising phenomics information for a better interoperability in the rare disease field. , 2018, European journal of medical genetics.

[11]  Emmanuel Dias-Neto,et al.  The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report , 2016, Microbiome.

[12]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[13]  Giorgio Valentini,et al.  A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease. , 2016, American journal of human genetics.

[14]  A. Olry,et al.  Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database , 2019, European Journal of Human Genetics.

[15]  Michael Brudno,et al.  PhenomeCentral: A Portal for Phenotypic and Genotypic Matchmaking of Patients with Rare Genetic Diseases , 2015, Human mutation.

[16]  F. Dhombres,et al.  Representation of rare diseases in health information systems: The orphanet approach to serve a wide range of end users , 2012, Human mutation.

[17]  M. Blanchette,et al.  Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment , 2012, PloS one.

[18]  Peter N. Robinson,et al.  Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology , 2014, BMC Bioinformatics.