AMELIE 3: Fully Automated Mendelian Patient Reanalysis at Under 1 Alert per Patient per Year

Background: Many thousands of patients with a suspected Mendelian disease have their exomes/genomes sequenced every year, but only about 30% receive a definitive diagnosis. Since a novel Mendelian gene-disease association is published on average every business day, thousands of undiagnosed patient cases could receive a diagnosis each year if their genomes were regularly compared to the latest literature. With millions of genomes expected to be sequenced for rare disease analysis by 2025, and considering the current publication rate of 1.1 million new articles per annum in PubMed, manually reanalyzing the growing cases of undiagnosed patients is not sustainable. Methods: We describe a fully automated reanalysis framework for patients with suspected, but undiagnosed, Mendelian disorders. The presented framework was tested by automatically parsing all ~100,000 newly published peer reviewed papers every month and matching them on genotype and phenotype with all stored undiagnosed patients. If a new article contains a possible diagnosis for an undiagnosed patient, the system provides notification. We test the accuracy of the automatic reanalysis system on 110 patients, including 61 with available trio data. Results: Even when trained only on older data, our system identifies 80% of reanalysis diagnoses, while sending only 0.5-1 alerts per patient per year, a 100-1,000-fold efficiency gain over manual literature surveillance of equivalent yield. Conclusion: We show that automatic reanalysis of patients with suspected Mendelian disease is feasible and has the potential to greatly streamline diagnosis. Our system is not intended to replace clinical judgment. Rather, clinical diagnostic services could greatly benefit from a modest re-allocation of time from manual literature exploration to review of automated reanalysis alerts. Our system additionally supports a new paradigm for medical IT systems: proactive, continuously learning and consequently able to autonomously identify valuable insights as they emerge in digital health records. We have launched automated patient reanalysis, trained on the latest data, with user accounts and daily literature updates at https://AMELIE.stanford.edu.

[1]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[2]  Chunlei Liu,et al.  ClinVar: improving access to variant interpretations and supporting evidence , 2017, Nucleic Acids Res..

[3]  P. Stenson,et al.  The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies , 2017, Human Genetics.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Gill Bejerano,et al.  AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature , 2019, Genetics in Medicine.

[6]  Ewan Birney,et al.  Genomics in healthcare: GA4GH looks to 2022 , 2017, bioRxiv.

[7]  Gill Bejerano,et al.  ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis , 2018, Genetics in Medicine.

[8]  Euan A Ashley,et al.  The Undiagnosed Diseases Network: Accelerating Discovery about Health and Disease. , 2017, American journal of human genetics.

[9]  Elspeth A. Bruford,et al.  Genenames.org: the HGNC resources in 2015 , 2014, Nucleic Acids Res..

[10]  Anna Lehman,et al.  The cost and diagnostic yield of exome sequencing for children with suspected genetic disorders: a benchmarking study , 2018, Genetics in Medicine.

[11]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[12]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[13]  Gill Bejerano,et al.  AMELIE accelerates Mendelian patient diagnosis directly from the primary literature , 2017, bioRxiv.

[14]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[15]  Ryan L. Collins,et al.  Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes , 2019, bioRxiv.

[16]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[17]  Magalie S Leduc,et al.  Clinical whole-exome sequencing for the diagnosis of mendelian disorders. , 2013, The New England journal of medicine.

[18]  M. Diekhans,et al.  AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature , 2020, Science Translational Medicine.

[19]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[20]  S. Scherer,et al.  Periodic reanalysis of whole-genome sequencing data enhances the diagnostic advantage over standard clinical genetic testing , 2018, European Journal of Human Genetics.

[21]  James H. Martin,et al.  Speech and Language Processing, 2nd Edition , 2008 .

[22]  Marcel E Dinger,et al.  Whole-exome sequencing reanalysis at 12 months boosts diagnosis and is cost-effective when applied early in Mendelian disorders , 2018, Genetics in Medicine.

[23]  J. Rosenfeld,et al.  Reanalysis of Clinical Exome Sequencing Data. , 2019, The New England journal of medicine.

[24]  X. Ji,et al.  Marked yield of re‐evaluating phenotype and exome/target sequencing data in 33 individuals with intellectual disabilities , 2018, American journal of medical genetics. Part A.

[25]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[26]  I. Krantz,et al.  Automated Clinical Exome Reanalysis Reveals Novel Diagnoses. , 2019, The Journal of molecular diagnostics : JMD.

[27]  Matthew H. Brush,et al.  A Comprehensive Iterative Approach is Highly Effective in Diagnosing Individuals who are Exome Negative , 2018, Genetics in Medicine.

[28]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[29]  P. Stenson,et al.  The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting , 2020, Human Genetics.

[30]  Matthew W. Darlison,et al.  Rare single gene disorders: estimating baseline prevalence and outcomes worldwide , 2018, Journal of Community Genetics.

[31]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[32]  Michael Brudno,et al.  PhenoTips: Patient Phenotyping Software for Clinical and Research Use , 2013, Human mutation.

[33]  George Church,et al.  Compelling Reasons for Repairing Human Germlines. , 2017, The New England journal of medicine.

[34]  G. Bejerano,et al.  Systematic reanalysis of clinical exome data yields additional diagnoses: implications for providers , 2016, Genetics in Medicine.

[35]  M. Watson,et al.  Current conditions in medical genetics practice , 2019, Genetics in Medicine.

[36]  Patrick Callier,et al.  Clinical whole-exome sequencing for the diagnosis of rare disorders with congenital anomalies and/or intellectual disability: substantial interest of prospective annual reanalysis , 2017, Genetics in Medicine.

[37]  Michael F. Wangler,et al.  Lessons learned from additional research analyses of unsolved clinical exome cases , 2017, Genome Medicine.

[38]  Arcadi Navarro,et al.  The European Genome-phenome Archive of human data consented for biomedical research , 2015, Nature Genetics.

[39]  Michael J Bamshad,et al.  Mendelian Gene Discovery: Fast and Furious with No End in Sight. , 2019, American journal of human genetics.

[40]  M. Diekhans,et al.  AMELIE 2 speeds up Mendelian diagnosis by matching patient phenotype & genotype to primary literature , 2019, bioRxiv.

[41]  Tomas W. Fitzgerald,et al.  Large-scale discovery of novel genetic causes of developmental disorders , 2014, Nature.