A molecular map of lung neuroendocrine neoplasms

Abstract Background Lung neuroendocrine neoplasms (LNENs) are rare solid cancers, with most genomic studies including a limited number of samples. Recently, generating the first multi-omic dataset for atypical pulmonary carcinoids and the first methylation dataset for large-cell neuroendocrine carcinomas led us to the discovery of clinically relevant molecular groups, as well as a new entity of pulmonary carcinoids (supra-carcinoids). Results To promote the integration of LNENs molecular data, we provide here detailed information on data generation and quality control for whole-genome/exome sequencing, RNA sequencing, and EPIC 850K methylation arrays for a total of 84 patients with LNENs. We integrate the transcriptomic data with other previously published data and generate the first comprehensive molecular map of LNENs using the Uniform Manifold Approximation and Projection (UMAP) dimension reduction technique. We show that this map captures the main biological findings of previous studies and can be used as reference to integrate datasets for which RNA sequencing is available. The generated map can be interactively explored and interrogated on the UCSC TumorMap portal (https://tumormap.ucsc.edu/?p=RCG_lungNENomics/LNEN). The data, source code, and compute environments used to generate and evaluate the map as well as the raw data are available, respectively, in a Nextjournal interactive notebook (https://nextjournal.com/rarecancersgenomics/a-molecular-map-of-lung-neuroendocrine-neoplasms/) and at the EMBL-EBI European Genome-phenome Archive and Gene Expression Omnibus data repositories. Conclusions We provide data and all resources needed to integrate them with future LNENs transcriptomic studies, allowing meaningful conclusions to be drawn that will eventually lead to a better understanding of this rare understudied disease.

[1]  Lisle E Mose,et al.  Improved indel detection in DNA and RNA via realignment with ABRA2 , 2019, Bioinform..

[2]  Matthew D. Wilkerson,et al.  ABRA: improved coding indel detection via assembly-based realignment , 2014, Bioinform..

[3]  Martin Vingron,et al.  Integrative genomic profiling of large-cell neuroendocrine carcinomas reveals distinct subtypes of high-grade neuroendocrine lung tumors , 2018, Nature Communications.

[4]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[5]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[6]  Timothy J. Peters,et al.  Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling , 2016, Genome Biology.

[7]  Ana Conesa,et al.  Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data , 2015, Bioinform..

[8]  Ira M. Hall,et al.  SAMBLASTER: fast duplicate marking and structural variant read extraction , 2014, Bioinform..

[9]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[10]  Robert Gentleman,et al.  Comprehensive genomic analysis identifies SOX2 as a frequently amplified gene in small-cell lung cancer , 2012, Nature Genetics.

[11]  Martin Vingron,et al.  Comprehensive genomic profiles of small cell lung cancer , 2015, Nature.

[12]  Edwin Cuppen,et al.  Sambamba: fast processing of NGS alignment formats , 2015, Bioinform..

[13]  Jeffrey Braithwaite,et al.  Integrating Genomics into Healthcare: A Global Responsibility. , 2019, American journal of human genetics.

[14]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[15]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[16]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[17]  K. Cibulskis,et al.  Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer , 2012, Nature Genetics.

[18]  David Haussler,et al.  TumorMap: Exploring the Molecular Similarities of Cancer Samples in an Interactive Portal. , 2017, Cancer research.

[19]  G. Fontanini,et al.  Most high-grade neuroendocrine tumours of the lung are likely to secondarily develop from pre-existing carcinoids: innovative findings skipping the current pathogenesis paradigm , 2018, Virchows Archiv.

[20]  Rafael A. Irizarry,et al.  Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays , 2014, Bioinform..

[21]  M. Dietel,et al.  A common classification framework for neuroendocrine neoplasms: an International Agency for Research on Cancer (IARC) and World Health Organization (WHO) expert consensus proposal , 2018, Modern Pathology.

[22]  L. Fernandez-Cuesta,et al.  Molecular studies of lung neuroendocrine neoplasms uncover new concepts and entities. , 2019, Translational lung cancer research.

[23]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[24]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[25]  P. Moran Notes on continuous stochastic phenomena. , 1950, Biometrika.

[26]  S. Salzberg,et al.  StringTie enables improved reconstruction of a transcriptome from RNA-seq reads , 2015, Nature Biotechnology.

[27]  Chang S. Chan,et al.  Integrative Genomic Characterization Identifies Molecular Subtypes of Lung Carcinoids. , 2019, Cancer research.

[28]  David Haussler,et al.  Barriers to accessing public cancer genomic data , 2019, Scientific Data.

[29]  N. Socci,et al.  Next-Generation Sequencing of Pulmonary Large Cell Neuroendocrine Carcinoma Reveals Small Cell Carcinoma–like and Non–Small Cell Carcinoma–like Subsets , 2016, Clinical Cancer Research.

[30]  Martin Vingron,et al.  Frequent mutations in chromatin-remodeling genes in pulmonary carcinoids , 2014, Nature Communications.

[31]  G. Alí,et al.  Gene Expression Profiling of Lung Atypical Carcinoids and Large Cell Neuroendocrine Carcinomas Identifies Three Transcriptomic Subtypes with Specific Genomic Alterations. , 2019, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[32]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[33]  Alexandru Telea,et al.  Deep learning multidimensional projections , 2019, Inf. Vis..

[34]  C. Magis,et al.  Nextflow : un outil efficace pour l’amélioration de la stabilité numérique des calculs en analyse génomique , 2017 .

[35]  Aurélie A G Gabriel,et al.  Integrative and comparative genomic analyses identify clinically relevant pulmonary carcinoid groups and unveil the supra-carcinoids , 2019, Nature Communications.

[36]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[37]  Rosane Minghim,et al.  Explaining Neighborhood Preservation for Multidimensional Projections , 2015, CGVC.

[38]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[39]  Emmanuel Paradis,et al.  ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R , 2018, Bioinform..