SIFORM: shared informative factor models for integration of multi-platform bioinformatic data

MOTIVATION High-dimensional omic data derived from different technological platforms have been extensively used to facilitate comprehensive understanding of disease mechanisms and to determine personalized health treatments. Numerous studies have integrated multi-platform omic data; however, few have efficiently and simultaneously addressed the problems that arise from high dimensionality and complex correlations. RESULTS We propose a statistical framework of shared informative factor models that can jointly analyze multi-platform omic data and explore their associations with a disease phenotype. The common disease-associated sample characteristics across different data types can be captured through the shared structure space, while the corresponding weights of genetic variables directly index the strengths of their association with the phenotype. Extensive simulation studies demonstrate the performance of the proposed method in terms of biomarker detection accuracy via comparisons with three popular regularized regression methods. We also apply the proposed method to The Cancer Genome Atlas lung adenocarcinoma dataset to jointly explore associations of mRNA expression and protein expression with smoking status. Many of the identified biomarkers belong to key pathways for lung tumorigenesis, some of which are known to show differential expression across smoking levels. We discover potential biomarkers that reveal different mechanisms of lung tumorigenesis between light smokers and heavy smokers. AVAILABILITY AND IMPLEMENTATION R code to implement the new method can be downloaded from http://odin.mdacc.tmc.edu/jhhu/ CONTACT: jhu@mdanderson.org.

[1]  Eun Hee Lee,et al.  Prognostic significance of phosphorylated 4E-binding protein 1 in non-small cell lung cancer. , 2015, International journal of clinical and experimental pathology.

[2]  A. Chiappori,et al.  Advanced EGFR mutation-positive non-small-cell lung cancer: case report, literature review, and treatment recommendations. , 2014, Cancer control : journal of the Moffitt Cancer Center.

[3]  P. Bonaldo,et al.  Collagen VI in cancer and its biological mechanisms. , 2013, Trends in molecular medicine.

[4]  Fadlo R Khuri,et al.  Fibronectin stimulates non-small cell lung carcinoma cell growth through activation of Akt/mammalian target of rapamycin/S6 kinase and inactivation of LKB1/AMP-activated protein kinase signal pathways. , 2006, Cancer research.

[5]  Dawei Song,et al.  Clinicopathological significance of E-cadherin and PCNA expression in hunman non-small cell lung cancer , 2008 .

[6]  Kenji Eguchi,et al.  Prognostic significance of expression of eukaryotic initiation factor 4E and 4E binding protein 1 in patients with pathological stage I invasive lung adenocarcinoma. , 2010, Lung cancer.

[7]  Yang Ni,et al.  Integrative Bayesian Network Analysis of Genomic Data , 2014, Cancer informatics.

[8]  Klaus Beiske,et al.  BRAF-mutations in non-small cell lung cancer. , 2014, Lung cancer.

[9]  Francesco C Stingo,et al.  miRNA–target gene regulatory networks: A Bayesian integrative approach to biomarker selection with application to kidney cancer , 2015, Biometrics.

[10]  A. Godwin,et al.  The role of the c-Jun N-terminal kinase 2-α-isoform in non-small cell lung carcinoma tumorigenesis , 2011, Oncogene.

[11]  Kim-Anh Do,et al.  DINGO: differential network analysis in genomics , 2015, Bioinform..

[12]  Raj Chari,et al.  An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer , 2010, BMC Systems Biology.

[13]  H. Kuwano,et al.  The clinical significance of Cyclin B1 and Wee1 expression in non-small-cell lung cancer. , 2004, Annals of oncology : official journal of the European Society for Medical Oncology.

[14]  Jun Li,et al.  TCPA: a resource for cancer functional proteomics data , 2013, Nature Methods.

[15]  D. Schrenk,et al.  The influence of environmental and genetic factors on CYP2D6, CYP1A2 and UDP-glucuronosyltransferases in man using sparteine, caffeine, and paracetamol as probes. , 1994, Pharmacogenetics.

[16]  D W Nebert,et al.  Extreme discordant phenotype methodology: an intuitive approach to clinical pharmacogenetics. , 2000, European journal of pharmacology.

[17]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[18]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[19]  Yuzo Uchida,et al.  Clinical implication of cyclin B1 in non-small cell lung cancer. , 2003, Oncology reports.

[20]  Nahum Sonenberg,et al.  Modulation of 4E-BP1 Function as a Critical Determinant of Enzastaurin-Induced Apoptosis , 2010, Molecular Cancer Therapeutics.

[21]  Gang Wu,et al.  EGFR mutations in non‐small‐cell lung cancer among smokers and non‐smokers: A meta‐analysis , 2012, Environmental and molecular mutagenesis.

[22]  Robert Tibshirani,et al.  Collaborative regression. , 2014, Biostatistics.

[23]  P. Petronini,et al.  Targeting PI3K/AKT/mTOR pathway in non small cell lung cancer. , 2014, Biochemical pharmacology.

[24]  Sandra A O'Toole,et al.  Molecular biology of lung cancer. , 2013, Journal of thoracic disease.

[25]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[26]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[27]  A. Frigessi,et al.  Principles and methods of integrative genomic analyses in cancer , 2014, Nature Reviews Cancer.

[28]  Alessandro Antonelli,et al.  Cellular Signaling Pathway Alterations and Potential Targeted Therapies for Medullary Thyroid Carcinoma , 2013, International journal of endocrinology.

[29]  Hiroshi Okazaki,et al.  Never‐smoking nonsmall cell lung cancer as a separate entity , 2008, Cancer.

[30]  Leah E Mechanic,et al.  Serum estrogen and tumor-positive estrogen receptor-alpha are strong prognostic classifiers of non-small-cell lung cancer survival in both men and women. , 2010, Carcinogenesis.

[31]  R. Sutherland,et al.  Expression and prognostic significance of cyclin B1 and cyclin A in non‐small cell lung cancer , 2009, Histopathology.

[32]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[33]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[34]  Richard S Houlston,et al.  Plasminogen activator inhibitor variants PAI-1 A15T and PAI-2 S413C influence lung cancer prognosis. , 2009, Lung cancer.

[35]  Chris Sander,et al.  Time to Recurrence and Survival in Serous Ovarian Tumors Predicted from Integrated Genomic Profiles , 2011, PloS one.

[36]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[37]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[38]  Wei Pan,et al.  Network‐Based Penalized Regression With Application to Genomic Data , 2013, Biometrics.

[39]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[40]  Yoshihiko Maehara,et al.  Non-small cell lung cancer in never smokers as a representative ‘non-smoking-associated lung cancer’: epidemiology and clinical features , 2011, International Journal of Clinical Oncology.

[41]  D. Beer,et al.  Checkpoint kinase 1 protein expression indicates sensitization to therapy by checkpoint kinase 1 inhibition in non-small cell lung cancer. , 2014, The Journal of surgical research.

[42]  Erika Avila-Tang,et al.  Lung Cancer in Never Smokers: Clinical Epidemiology and Environmental Risk Factors , 2009, Clinical Cancer Research.

[43]  A. Gazdar,et al.  Lung cancer in never smokers — a different disease , 2007, Nature Reviews Cancer.

[44]  E. Brambilla,et al.  Expression of plasminogen activator inhibitors 1 and 2 in lung cancer and their role in tumor progression. , 1999, Clinical cancer research : an official journal of the American Association for Cancer Research.

[45]  S. Piantadosi,et al.  Chromosomal alterations in lung adenocarcinoma from smokers and nonsmokers. , 2001, Cancer research.

[46]  Yingying Fan,et al.  Tuning parameter selection in high dimensional penalized likelihood , 2013, 1605.03321.

[47]  Qian Wang,et al.  Twist1-mediated 4E-BP1 regulation through mTOR in non-small cell lung cancer , 2015, Oncotarget.

[48]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[49]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[50]  M. Serke [Lung cancer: targeted therapy]. , 2007 .

[51]  A. Chinnaiyan,et al.  Integrative analysis of the cancer transcriptome , 2005, Nature Genetics.

[52]  Hideki Kawai,et al.  Estrogen Receptor α and β are Prognostic Factors in Non–Small Cell Lung Cancer , 2005, Clinical Cancer Research.

[53]  A. Gingras,et al.  4E-BP1, a repressor of mRNA translation, is phosphorylated and inactivated by the Akt(PKB) signaling pathway. , 1998, Genes & development.

[54]  P. Laird,et al.  Discovery of multi-dimensional modules by integrative analysis of cancer genomic data , 2012, Nucleic acids research.

[55]  Abby C Collier,et al.  Metabolizing enzyme localization and activities in the first trimester human placenta: the effect of maternal and gestational age, smoking and alcohol consumption. , 2002, Human reproduction.

[56]  Arnoldo Frigessi,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm305 Gene expression Predicting survival from microarray data—a comparative study , 2022 .

[57]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[58]  M. Saif,et al.  The Biological Role of PI3K Pathway in Lung Cancer , 2012, Pharmaceuticals.

[59]  D. Wheeler,et al.  The nuclear epidermal growth factor receptor signaling network and its role in cancer. , 2011, Discovery medicine.

[60]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[61]  Liang Dai,et al.  The expression of TTF-1 and Napsin A in early-stage lung adenocarcinoma correlates with the results of surgical treatment , 2015, Tumor Biology.

[62]  I. Petrache,et al.  Overexpression of type VI collagen in neoplastic lung tissues , 2014, Oncology reports.

[63]  Kenneth M. Yamada,et al.  Fibronectin at a glance , 2002, Journal of Cell Science.

[64]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[65]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.