A k-mer based transcriptomics analysis for NPM1-mutated AML

Motivation: Acute Myeloid Leukemia is a highly heterogeneous disease. Although current classifications are well-known and widely adopted, many patients experience drug resistance and disease relapse. New biomarkers are needed to make classifications more reliable and propose personalized treatment. Results: We performed tests on a large scale in 3 AML cohorts, 1112 RNAseq samples. The accuracy to distinguish NPM1 mutant and non-mutant patients using machine learning models achieved more than 95% in three different scenarios. Using our approach, we found already described genes associated with NPM1 mutations and new genes to be investigated. Furthermore, we provide a new view to search for signatures/biomarkers and explore diagnosis/prognosis, at the k-mer level. Availability: Code available at https://github.com/railorena/npm1aml and https://osf.io/4s9tc/. The cohorts used in this article were authorized for use.

[1]  Jiaheng Zhou,et al.  Identification of the Thyrotropin-Releasing Hormone (TRH) as a Novel Biomarker in the Prognosis for Acute Myeloid Leukemia , 2022, Biomolecules.

[2]  R. Collins,et al.  Integrative analysis of drug response and clinical outcome in acute myeloid leukemia. , 2022, Cancer cell.

[3]  K. E. El Kasmi,et al.  CD206+ tumor-associated macrophages cross-present tumor antigen and drive antitumor immunity , 2022, JCI insight.

[4]  Dave Gunning,et al.  DARPA ’s Explainable AI ( XAI ) program: A retrospective , 2021, Applied AI Letters.

[5]  David Gunning,et al.  DARPA’s Explainable AI (XAI) program: A retrospective , 2021, Applied AI Letters.

[6]  D. Gautheret,et al.  Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-seq datasets , 2021, bioRxiv.

[7]  R. Chikhi,et al.  kmtricks: efficient and flexible construction of Bloom filters for large sequencing data collections , 2021, bioRxiv.

[8]  M. Bornhäuser,et al.  Application of machine learning in the management of acute myeloid leukemia: current practice and future prospects. , 2020, Blood advances.

[9]  N. Forkert,et al.  Machine Learning for Precision Medicine. , 2020, Genome.

[10]  Christina Boucher,et al.  Data structures based on k-mers for querying large collections of sequencing data sets , 2019, bioRxiv.

[11]  Beth Wilmot,et al.  Functional Genomic Landscape of Acute Myeloid Leukemia , 2018, Nature.

[12]  A. Mansoor,et al.  Acute Myeloid Leukemia (AML): Upregulation of BAALC/MN1/MLLT11/EVI1 Gene Cluster Relate With Poor Overall Survival and a Possible Linkage With Coexpression of MYC/BCL2 Proteins , 2017, Applied immunohistochemistry & molecular morphology : AIMM.

[13]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[14]  H. Makishima Somatic SETBP1 mutations in myeloid neoplasms , 2017, International Journal of Hematology.

[15]  Bob Löwenberg,et al.  Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel. , 2017, Blood.

[16]  J. C. Arroyave,et al.  Molecular biomarkers in acute myeloid leukemia. , 2017, Blood reviews.

[17]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[18]  Alexander Dobin,et al.  Mapping RNA‐seq Reads with STAR , 2015, Current protocols in bioinformatics.

[19]  Alessandro Vullo,et al.  The Ensembl REST API: Ensembl Data for Any Language , 2014, Bioinform..

[20]  H. Young,et al.  Handbook of Game Theory with Economic Applications , 2015 .

[21]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[22]  J. S. Mateo The Shapley Value , 2012 .

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[25]  Wolfram Goessling,et al.  The Wnt/β-Catenin Pathway Is Required for the Development of Leukemia Stem Cells in AML , 2010, Science.

[26]  Yagang Zhang,et al.  Application of Machine Learning , 2010 .

[27]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[28]  H. Gralnick,et al.  Proposals for the Classification of the Acute Leukaemias French‐American‐British (FAB) Co‐operative Group , 1976, British journal of haematology.