Model simplification for supervised classification of metabolic networks

Many real applications require the representation of complex entities and their relations. Frequently, networks are the chosen data structures, due to their ability to highlight topological and qualitative characteristics. In this work, we are interested in supervised classification models for data in the form of networks. Given two or more classes whose members are networks, we build mathematical models to classify them, based on various graph distances. Due to the complexity of the models, made of tens of thousands of nodes and edges, we focus on model simplification solutions to reduce execution times, still maintaining high accuracy. Experimental results on three datasets of biological interest show the achieved performance improvements.

[1]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[2]  Panos M. Pardalos,et al.  A classification method based on generalized eigenvalue problems , 2007, Optim. Methods Softw..

[3]  Panos M. Pardalos,et al.  Robust generalized eigenvalue classifier with ellipsoidal uncertainty , 2014, Ann. Oper. Res..

[4]  Panos M. Pardalos,et al.  Quantification of network structural dissimilarities , 2017, Nature Communications.

[5]  Sambit Ghosh,et al.  Validation of protein structure models using network similarity score , 2017, Proteins.

[6]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[7]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Zhixing Huang,et al.  Image Classification with a Novel Semantic Linear-Time Graph Kernel , 2015, 2015 11th International Conference on Semantics, Knowledge and Grids (SKG).

[10]  Natapol Pornputtapong,et al.  Reconstruction of Genome-Scale Active Metabolic Networks for 69 Human Cell Types and 16 Cancer Types Using INIT , 2012, PLoS Comput. Biol..

[11]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[12]  Bernard Ng,et al.  Recent advances in supervised learning for brain graph classification , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[13]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[14]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[15]  En Wang,et al.  Cut Based Method for Comparing Complex Networks , 2018, Scientific Reports.

[16]  Panos M. Pardalos,et al.  Classification of cancer cell death with spectral dimensionality reduction and generalized eigenvalues , 2011, Artif. Intell. Medicine.

[17]  Ryan K Van Laar,et al.  Design and multiseries validation of a web-based gene expression assay for predicting breast cancer recurrence and patient survival. , 2011, The Journal of molecular diagnostics : JMD.

[18]  S. Pinder,et al.  Comparing Breast Cancer Multiparameter Tests in the OPTIMA Prelim Trial: No Test Is More Equal Than the Others. , 2016, Journal of the National Cancer Institute.

[19]  Xuelong Li,et al.  A survey of graph edit distance , 2010, Pattern Analysis and Applications.

[20]  P. Bonacich Factoring and weighting approaches to status scores and clique identification , 1972 .

[21]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[22]  Theodore B. Trafalis,et al.  Robust support vector machines for classification and computational issues , 2007, Optim. Methods Softw..

[23]  Panos M. Pardalos,et al.  Assessing diversity in multiplex networks , 2018, Scientific Reports.

[24]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[25]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[26]  An-Ping Zeng,et al.  Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms , 2003, Bioinform..

[27]  Sean R. Davis,et al.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor , 2007, Bioinform..

[28]  Craig D. Shriver,et al.  Effect of ASCO/CAP Guidelines for Determining ER Status on Molecular Subtype , 2012, Annals of Surgical Oncology.

[29]  G. von Heijne,et al.  Tissue-based map of the human proteome , 2015, Science.

[30]  Sadegh Aliakbary,et al.  Classification of complex networks based on similarity of topological network features. , 2017, Chaos.

[31]  Koji Tsuda,et al.  Graph Classification , 2010, Managing and Mining Graph Data.

[32]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[33]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[34]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[35]  Panos M. Pardalos,et al.  Supervised Classification of Metabolic Networks , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[36]  R. Deberardinis,et al.  Cellular Metabolism and Disease: What Do Metabolic Outliers Teach Us? , 2012, Cell.

[37]  Sambit Ghosh,et al.  A graph spectral-based scoring scheme for network comparison , 2016, J. Complex Networks.