On the Use of Topological Features of Metabolic Networks for the Classification of Cancer Samples

Background The increasing availability of omics data collected from patients affected by severe pathologies, such as cancer, is fostering the development of data science methods for their analysis. Introduction The combination of data integration and machine learning approaches can provide new powerful instruments to tackle the complexity of cancer development and deliver effective diagnostic and prognostic strategies. Methods We explore the possibility of exploiting the topological properties of sample-specific metabolic networks as features in a supervised classification task. Such networks are obtained by projecting transcriptomic data from RNA-seq experiments on genome-wide metabolic models to define weighted networks modeling the overall metabolic activity of a given sample. Results We show the classification results on a labeled breast cancer dataset from the TCGA database, including 210 samples (cancer vs. normal). In particular, we investigate how the performance is affected by a threshold-based pruning of the networks by comparing Artificial Neural Networks, Support Vector Machines and Random Forests. Interestingly, the best classification performance is achieved within a small threshold range for all methods, suggesting that it might represent an effective choice to recover useful information while filtering out noise from data. Overall, the best accuracy is achieved with SVMs, which exhibit performances similar to those obtained when gene expression profiles are used as features. Conclusion These findings demonstrate that the topological properties of sample-specific metabolic networks are effective in classifying cancer and normal samples, suggesting that useful information can be extracted from a relatively limited number of features.

[1]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[2]  M. Uhlén,et al.  Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients with non-alcoholic fatty liver disease , 2014, Nature Communications.

[3]  Roberto Serra,et al.  A stochastic model of autocatalytic reaction networks , 2012, Theory in Biosciences.

[4]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[7]  Giancarlo Mauri,et al.  A comparison of machine learning techniques for survival prediction in breast cancer , 2011, BioData Mining.

[8]  Anirban Banerjee,et al.  Spectral plot properties: Towards a qualitative classification of networks , 2008, Networks Heterog. Media.

[9]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[10]  Marco S. Nobile,et al.  Computational Strategies for a System-Level Understanding of Metabolism , 2014, Metabolites.

[11]  Giancarlo Mauri,et al.  Algorithmic methods to infer the evolutionary trajectories in cancer progression , 2015, Proceedings of the National Academy of Sciences.

[12]  Nathan E. Lewis,et al.  The evolution of genome-scale models of cancer metabolism , 2013, Front. Physiol..

[13]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[15]  Giancarlo Mauri,et al.  MaREA4Galaxy: Metabolic reaction enrichment analysis and visualization of RNA-seq data within Galaxy , 2020, Computational and structural biotechnology journal.

[16]  Steven J. M. Jones,et al.  Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer , 2015, Cell.

[17]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[18]  W. Marston Linehan,et al.  Targeting Cancer Metabolism , 2012, Clinical Cancer Research.

[19]  Claudio Angione,et al.  Machine and deep learning meet genome-scale metabolic modeling , 2019, PLoS Comput. Biol..

[20]  J. L. Hodges,et al.  The significance probability of the smirnov two-sample test , 1958 .

[21]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[23]  Chiara Damiani,et al.  Systems metabolomics: from metabolomic snapshots to design principles. , 2020, Current opinion in biotechnology.

[24]  P. Ward,et al.  Metabolic reprogramming: a cancer hallmark even warburg did not anticipate. , 2012, Cancer cell.

[25]  Giancarlo Mauri,et al.  Integration of transcriptomic data and metabolic networks in cancer samples reveals highly significant prognostic power , 2018, J. Biomed. Informatics.

[26]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[27]  Giulio Caravagna,et al.  Detecting repeated cancer evolution from multi-region tumor sequencing data , 2018, Nature Methods.

[28]  Giancarlo Mauri,et al.  Pathway-based classification of breast cancer subtypes. , 2017, Frontiers in bioscience.

[29]  Roberto Serra,et al.  A stochastic model of catalytic reaction networks in protocells , 2014, Natural Computing.

[30]  Ronan M. T. Fleming,et al.  A community-driven global reconstruction of human metabolism , 2013, Nature Biotechnology.

[31]  Filipi Nascimento Silva,et al.  A pattern recognition approach to complex networks , 2010 .

[32]  Giancarlo Mauri,et al.  Integration of single-cell RNA-seq data into population models to characterize cancer metabolism , 2019, PLoS Comput. Biol..

[33]  Masaru Tomita,et al.  Systems Biology, Metabolomics, and Cancer Metabolism , 2012, Science.

[34]  Nils M. Kriege,et al.  A survey on graph kernels , 2019, Applied Network Science.

[35]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[36]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[37]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[38]  Daniel Machado,et al.  Systematic Evaluation of Methods for Integration of Transcriptomic Data into Constraint-Based Models of Metabolism , 2014, PLoS Comput. Biol..

[39]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[40]  Kenji Ishida Stochastic Model for Autocatalytic Reaction , 1969 .

[41]  O. Bruno,et al.  Topological assessment of metabolic networks reveals evolutionary information , 2018, Scientific Reports.

[43]  Nathan E Lewis,et al.  Analysis of omics data with genome-scale models of metabolism. , 2013, Molecular bioSystems.

[44]  An-Ping Zeng,et al.  The Connectivity Structure, Giant Strong Component and Centrality of Metabolic Networks , 2003, Bioinform..

[45]  D. Sabatini,et al.  Cancer cell metabolism: one hallmark, many faces. , 2012, Cancer discovery.

[46]  Neil Swainston,et al.  Recon 2.2: from reconstruction to model of human metabolism , 2016, Metabolomics.

[47]  Odemir Martinez Bruno,et al.  An optimized shape descriptor based on structural properties of networks , 2017, Digit. Signal Process..

[48]  André Ricardo Backes,et al.  A complex network-based approach for boundary shape analysis , 2009, Pattern Recognit..

[49]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[50]  Giancarlo Mauri,et al.  popFBA: tackling intratumour heterogeneity with Flux Balance Analysis , 2017, Bioinform..

[51]  Oveis Jamialahmadi,et al.  A benchmark-driven approach to reconstruct metabolic networks for studying cancer metabolism , 2019, PLoS Comput. Biol..

[52]  Roberto Serra,et al.  A stochastic model of the emergence of autocatalytic cycles , 2011 .

[53]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[54]  Michael L. Gatza,et al.  A pathway-based classification of human breast cancer , 2010, Proceedings of the National Academy of Sciences.

[55]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.