A Multi-Method Approach for Proteomic Network Inference in 11 Human Cancers

Protein expression and post-translational modification levels are tightly regulated in neoplastic cells to maintain cellular processes known as ‘cancer hallmarks’. The first Pan-Cancer initiative of The Cancer Genome Atlas (TCGA) Research Network has aggregated protein expression profiles for 3,467 patient samples from 11 tumor types using the antibody based reverse phase protein array (RPPA) technology. The resultant proteomic data can be utilized to computationally infer protein-protein interaction (PPI) networks and to study the commonalities and differences across tumor types. In this study, we compare the performance of 13 established network inference methods in their capacity to retrieve the curated Pathway Commons interactions from RPPA data. We observe that no single method has the best performance in all tumor types, but a group of six methods, including diverse techniques such as correlation, mutual information, and regression, consistently rank highly among the tested methods. We utilize the high performing methods to obtain a consensus network; and identify four robust and densely connected modules that reveal biological processes as well as suggest antibody–related technical biases. Mapping the consensus network interactions to Reactome gene lists confirms the pan-cancer importance of signal transduction pathways, innate and adaptive immune signaling, cell cycle, metabolism, and DNA repair; and also suggests several biological processes that may be specific to a subset of tumor types. Our results illustrate the utility of the RPPA platform as a tool to study proteomic networks in cancer.

[1]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[2]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[3]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[4]  Holger Schwender,et al.  Bibliography Reverse Engineering Genetic Networks Using the Genenet Package , 2006 .

[5]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[6]  Nicolas Thierry-Mieg,et al.  New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size , 2010, BMC Bioinformatics.

[7]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[8]  Kei-Hoi Cheung,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[9]  Gabriele Sales,et al.  parmigene - a parallel R package for mutual information estimation and gene network reconstruction , 2011, Bioinform..

[10]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Chris Sander,et al.  Emerging landscape of oncogenic signatures across human cancers , 2013, Nature Genetics.

[12]  Pornpimol Charoentong,et al.  ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks , 2009, Bioinform..

[13]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[14]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[15]  Joel S. Bader,et al.  Where Have All the Interactions Gone? Estimating the Coverage of Two-Hybrid Protein Interaction Maps , 2007, PLoS Comput. Biol..

[16]  J. Locasale,et al.  Maximum Entropy Reconstructions of Dynamic Signaling Networks from Quantitative Proteomics Data , 2009, PloS one.

[17]  David W. Scott The New S Language , 1990 .

[18]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[19]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[20]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[21]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[22]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[23]  Kei-Hoi Cheung,et al.  The BioPAX community standard for pathway data sharing (Nature Biotechnology (2010) 28, (935-942)) , 2012 .

[24]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[25]  Donald Geman,et al.  Learning Multivariate Distributions by Competitive Assembly of Marginals , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[27]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[28]  Prahlad T. Ram,et al.  A pan-cancer proteomic perspective on The Cancer Genome Atlas , 2014, Nature Communications.

[29]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  A S Lapedes,et al.  Superadditive correlation. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[31]  G. Mills,et al.  Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells , 2006, Molecular Cancer Therapeutics.

[32]  A. Maritan,et al.  Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns , 2006, Proceedings of the National Academy of Sciences.

[33]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[34]  Vincent Frouin,et al.  Gene Association Networks from Microarray Data Using a Regularized Estimation of Partial Correlation Based on PLS Regression , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[36]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[37]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[38]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[40]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[41]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[42]  Olivier Ledoit,et al.  Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , 2003 .

[43]  Jun Li,et al.  TCPA: a resource for cancer functional proteomics data , 2013, Nature Methods.

[44]  E. Petricoin,et al.  Reverse phase protein microarrays which capture disease progression show activation of pro-survival pathways at the cancer invasion front , 2001, Oncogene.

[45]  Gary D Bader,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[46]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[47]  Gordon B Mills,et al.  Validation of reverse phase protein array for practical screening of potential biomarkers in serum and plasma: Accurate detection of CA19‐9 levels in pancreatic cancer , 2008, Proteomics.

[48]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[49]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[50]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[51]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Anne-Laure Boulesteix,et al.  Regularized estimation of large-scale gene association networks using graphical Gaussian models , 2009, BMC Bioinformatics.

[53]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.