Developing an Integrated Genomic Profile for Cancer Patients with the Use of NGS Data

Next Generation Sequencing (NGS) technologies has revolutionized genomics data research by facilitating high-throughput sequencing of genetic material that comes from different sources, such as Whole Exome Sequencing (WES) and RNA Sequencing (RNAseq). The exploitation and integration of this wealth of heterogeneous sequencing data remains a major challenge. There is a clear need for approaches that attempt to process and combine the aforementioned sources in order to create an integrated profile of a patient that will allow us to build the complete picture of a disease. This work introduces such an integrated profile using Chronic Lymphocytic Leukemia (CLL) as the exemplary cancer type. The approach described in this paper links the various NGS sources with the patients’ clinical data. The resulting profile efficiently summarizes the large-scale datasets, links the results with the clinical profile of the patient and correlates indicators arising from different data types. With the use of state-of-the-art machine learning techniques and the association of the clinical information with these indicators, which served as the feature pool for the classification, it has been possible to build efficient predictive models. To ensure reproducibility of the results, open data were exclusively used in the classification assessment. The final goal is to design a complete genomic profile of a cancer patient. The profile includes summarization and visualization of the results of WES and RNAseq analysis (specific variants and significantly expressed genes, respectively) and the clinical profile, integration/comparison of these results and a prediction regarding the disease trajectory. Concluding, this work has managed to produce a comprehensive clinico-genetic profile of a patient by successfully integrating heterogeneous data sources. The proposed profile can contribute to the medical research providing new possibilities in personalized medicine and prognostic views.

[1]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[2]  Dimitrios Zafeiris,et al.  An Artificial Neural Network Integrated Pipeline for Biomarker Discovery Using Alzheimer's Disease as a Case Study , 2018, Computational and structural biotechnology journal.

[3]  Christos G. Cassandras,et al.  Integrating mutation and gene expression cross-sectional data to infer cancer progression , 2016, BMC Systems Biology.

[4]  Emili Montserrat,et al.  Genetic lesions in chronic lymphocytic leukemia: what’s ready for prime time use? , 2010, Haematologica.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Bo Li,et al.  VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis , 2018, BMC Bioinformatics.

[7]  Héctor Corrada Bravo,et al.  Epiviz: interactive visual analytics for functional genomics data , 2014, Nature Methods.

[8]  Giovanni Felici,et al.  Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction , 2018, BioData Mining.

[9]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[10]  Mattia D'Antonio,et al.  WEP: a high-performance analysis pipeline for whole-exome data , 2013, BMC Bioinformatics.

[11]  Yan Guo,et al.  Architectures and accuracy of artificial neural network for disease classification from omics data , 2019, BMC Genomics.

[12]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[13]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[14]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[15]  D. Hume,et al.  Exome Sequencing: Current and Future Perspectives , 2015, G3: Genes, Genomes, Genetics.

[16]  M. Nagymihály,et al.  Next-Generation Sequencing and its new possibilities in medicine , 2015 .

[17]  Christopher Gignoux,et al.  The 1000 Genomes Project: new opportunities for research and social challenges , 2010, Genome Medicine.

[18]  B. Gener,et al.  Integrated analysis of whole-exome sequencing and transcriptome profiling in males with autism spectrum disorders , 2015, Molecular Autism.

[19]  Shuying Sun,et al.  Integrative analysis of gene expression and methylation data for breast cancer cell lines , 2018, BioData Mining.

[20]  B. Dörken,et al.  EGR2 mutations define a new clinically aggressive subgroup of chronic lymphocytic leukemia , 2017, Leukemia.

[21]  Jeffrey T Leek,et al.  Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown , 2016, Nature Protocols.

[22]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[23]  Scott D. Kahn On the Future of Genomic Data , 2011, Science.

[24]  E. Boerwinkle,et al.  dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions , 2011, Human mutation.

[25]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[26]  Kai Wang,et al.  SeqMule: automated pipeline for analysis of human exome/genome sequencing data , 2015, Scientific Reports.

[27]  Christopher R. Cabanski,et al.  Integrated RNA and DNA sequencing improves mutation detection in low purity tumors , 2014, Nucleic acids research.

[28]  Subhajyoti De,et al.  IMPACT: a whole-exome sequencing analysis pipeline for integrating molecular profiles with actionable therapeutics in clinical samples , 2016, J. Am. Medical Informatics Assoc..

[29]  Nuno A. Fonseca,et al.  iRAP - an integrated RNA-seq Analysis Pipeline , 2014, bioRxiv.

[30]  Bruno Zeitouni,et al.  Abstract 2701: Combining whole-exome and RNA-Seq data improves the quality of PDX mutation profiles , 2016 .

[31]  Steven R. Head,et al.  Next-generation sequencing , 2010, Nature Reviews Drug Discovery.