PINSPlus: a tool for tumor subtype discovery in integrated genomic data

Summary Since cancer is a heterogeneous disease, tumor subtyping is crucial for improved treatment and prognosis. We have developed a subtype discovery tool, called PINSPlus, that is: i) robust against noise and unstable quantitative assays, ii) able to integrate multiple types of omics data in a single analysis, and iii) dramatically superior to established approaches in identifying known subtypes and novel subgroups with significant survival differences. Our validation on 12,344 samples from 44 datasets shows that PINSPlus vastly outperforms other approaches. The software is easy-to-use and can partition hundreds of patients in a few minutes on a personal computer. Availability The package is available at https://cran.r-project.org/package=PINSPlus. Data and R script used in this manuscript are available at https://bioinformatics.cse.unr.edu/software/PINSPlus/. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  S. Drăghici,et al.  A novel approach for data integration and disease subtyping , 2017, Genome research.

[2]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[3]  Fumihiro Tanaka,et al.  Recurrence after surgery in patients with NSCLC. , 2014, Translational lung cancer research.

[4]  C. Sander,et al.  Pattern discovery and cancer gene identification in integrated cancer genomic data , 2013, Proceedings of the National Academy of Sciences.

[5]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[6]  W. V. van IJcken,et al.  Gene Expression-Based Classification of Non-Small Cell Lung Carcinomas and Survival Prediction , 2010, PloS one.

[7]  J. Gribben,et al.  Peripheral blood T cells in acute myeloid leukemia (AML) patients at diagnosis have abnormal phenotype and genotype and form defective immune synapses with AML blasts. , 2009, Blood.

[8]  Laura Esserman,et al.  Rethinking screening for breast cancer and prostate cancer. , 2009, JAMA.

[9]  Torsten Haferlach,et al.  Microarray-based classifiers and prognosis models identify subgroups with distinct clinical outcomes and high risk of AML transformation of myelodysplastic syndrome. , 2009, Blood.

[10]  Holger Sültmann,et al.  Global gene expression analysis reveals specific patterns of cell junctions in non-small cell lung cancer subtypes. , 2009, Lung cancer.

[11]  L. Esserman,et al.  Cancer Rethinking Screening for Breast Cancer and Prostate , 2009 .

[12]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[14]  Ben Bolstad,et al.  Low-level Analysis of High-density Oligonucleotide Array Data: Background, Normalization and Summarization , 2003 .

[15]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[16]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[17]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[19]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .