Personalized Annotation-based Networks (PAN) for the Prediction of Breast Cancer Relapse

The classification of clinical samples based on gene expression data is an important part of precision medicine. However, it has proved difficult to accurately predict survival outcomes and treatment responses for cancer patients. In this manuscript, we show how transforming gene expression data into a set of personalized (sample-specific) networks can allow us to harness existing graph-based methods to improve classifier performance. Existing approaches to personalized gene networks all have the limitation that they depend on other samples in the data and must get re-computed whenever a new sample is introduced. Here, we propose a novel method, called Personalized Annotation-based Networks (PAN), that avoids this limitation by using curated annotation databases to transform gene expression data into a graph. These databases organize genes into overlapping gene sets, called annotations, that we use to build a network where nodes represent functional terms and edges represent the similarity between them. Unlike competing methods, PANs are calculated for each sample independent of the population, making it a more efficient way to obtain single-sample networks. Using three breast cancer datasets as a case study (METABRIC and a super-set of GEO studies), we show that PAN classifiers not only predict cancer relapse better than gene features alone, but also outperform PPI and population-level graph-based classifiers. This work demonstrates the practical advantages of graph-based classification for high-dimensional genomic data, while offering a new approach to making sample-specific networks. Supplementary information The codes and data are available at https://github.com/thinng/PAN. Contact Thuc.Le@unisa.edu.au

[1]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[2]  Rafael A Irizarry,et al.  Frozen robust multiarray analysis (fRMA). , 2010, Biostatistics.

[3]  Carme Camps,et al.  microRNA-associated progression pathways and potential therapeutic targets identified by integrated mRNA and microRNA expression profiling in breast cancer. , 2011, Cancer research.

[4]  Sreeram Kannan,et al.  Towards inferring causal gene regulatory networks from single cell expression Measurements , 2018 .

[5]  Joel S. Parker,et al.  Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer , 2016, Bioinform..

[6]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[7]  Zoltan Szallasi,et al.  Amplification of LAPTM4B and YWHAZ contributes to chemotherapy resistance and recurrence of breast cancer , 2010, Nature Medicine.

[8]  K. Aihara,et al.  Personalized characterization of diseases using sample-specific networks , 2016, bioRxiv.

[9]  N. Harbeck,et al.  St. Gallen/Vienna 2017: A Brief Summary of the Consensus Discussion about Escalation and De-Escalation of Primary Breast Cancer Treatment , 2017, Breast Care.

[10]  A. Shaw,et al.  Tumour heterogeneity and resistance to cancer therapies , 2018, Nature Reviews Clinical Oncology.

[11]  Benjamin Haibe-Kains,et al.  DNA methylation profiling reveals a predominant immune component in breast cancers , 2011, EMBO molecular medicine.

[12]  Xiangtian Yu,et al.  Individual-specific edge-network analysis for disease prediction , 2017, Nucleic acids research.

[13]  Cranos M. Williams,et al.  Predicting gene regulatory networks by combining spatial and temporal gene expression data in Arabidopsis root stem cells , 2017, Proceedings of the National Academy of Sciences.

[14]  Gianluca Bontempi,et al.  Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen , 2008, BMC Genomics.

[15]  Andy Pereira Faculty Opinions recommendation of Predicting gene regulatory networks by combining spatial and temporal gene expression data in Arabidopsis root stem cells. , 2019, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[16]  Mary Shimoyama,et al.  Disease Ontology: improving and unifying disease annotations across species , 2018, Disease Models & Mechanisms.

[17]  Santu Rana,et al.  A random walk down personalized single-cell networks: predicting the response of any gene to any drug for any patient , 2019, bioRxiv.

[18]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[19]  Qiang Sun,et al.  Individual-level analysis of differential expression of genes and pathways for personalized medicine , 2015, Bioinform..

[20]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[21]  John Quackenbush,et al.  PyPanda: a Python package for gene regulatory network reconstruction , 2016, Bioinform..

[22]  Andy J. Minn,et al.  Genes that mediate breast cancer metastasis to lung , 2005, Nature.

[23]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[24]  Lei Zhang,et al.  Discovering personalized driver mutation profiles of single samples in cancer by network control strategy , 2018, Bioinform..

[25]  Daniel Birnbaum,et al.  A gene expression signature identifies two prognostic subgroups of basal breast cancer , 2011, Breast Cancer Research and Treatment.

[26]  Martin H. Schaefer,et al.  HIPPIE v2.0: enhancing meaningfulness and reliability of protein–protein interaction networks , 2016, Nucleic Acids Res..

[27]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[28]  Seung Il Kim,et al.  Personalized Medicine in Breast Cancer: A Systematic Review , 2012, Journal of breast cancer.

[29]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[30]  Zhonghu Bai,et al.  Breast cancer intrinsic subtype classification, clinical use and future trends. , 2015, American journal of cancer research.

[31]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[32]  Nadia Harbeck,et al.  A Brief Summary of the Consensus Discussion , 2015 .

[33]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[34]  John Quackenbush,et al.  Estimating Sample-Specific Regulatory Networks , 2015, iScience.

[35]  Ulrik Brandes,et al.  On variants of shortest-path betweenness centrality and their generic computation , 2008, Soc. Networks.

[36]  Jian Zhu,et al.  Systematic identification of transcriptional and post-transcriptional regulations in human respiratory epithelial cells during influenza A virus infection , 2014, BMC Bioinformatics.

[37]  W. Gerald,et al.  Genes that mediate breast cancer metastasis to the brain , 2009, Nature.

[38]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[39]  Hans-Peter Kriegel,et al.  Graph Kernels For Disease Outcome Prediction From Protein-Protein Interaction Networks , 2006, Pacific Symposium on Biocomputing.

[40]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[41]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.