Multi-hop assortativities for networks classification

Several social, medical, engineering and biological challenges rely on discovering the functionality of networks from their structure and node metadata, when it is available. For example, in chemoinformatics one might want to detect whether a molecule is toxic based on structure and atomic types, or discover the research field of a scientific collaboration network. Existing techniques rely on counting or measuring structural patterns that are known to show large variations from network to network, such as the number of triangles, or the assortativity of node metadata. We introduce the concept of multi-hop assortativity, that captures the similarity of the nodes situated at the extremities of a randomly selected path of a given length. We show that multi-hop assortativity unifies various existing concepts and offers a versatile family of 'fingerprints' to characterize networks. These fingerprints allow in turn to recover the functionalities of a network, with the help of the machine learning toolbox. Our method is evaluated empirically on established social and chemoinformatic network benchmarks. Results reveal that our assortativity based features are competitive providing highly accurate results often outperforming state of the art methods for the network classification task.

[1]  Ashwin Srinivasan,et al.  The Predictive Toxicology Challenge 2000-2001 , 2001, Bioinform..

[2]  Phillip Bonacich,et al.  Eigenvector-like measures of centrality for asymmetric relations , 2001, Soc. Networks.

[3]  Paolo Frasconi,et al.  Shift Aggregate Extract Networks , 2017, Front. Robot. AI.

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Edwin R. Hancock,et al.  Pattern Vectors from Algebraic Graph Theory , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Jean-Charles Delvenne,et al.  Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks , 2014, IEEE Transactions on Network Science and Engineering.

[8]  E. H. Simpson Measurement of Diversity , 1949, Nature.

[9]  Marie Hastings-Tolsma,et al.  Measurement of Diversity of Human Field Pattern , 2011 .

[10]  Antje Chang,et al.  BRENDA , the enzyme database : updates and major new developments , 2003 .

[11]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[12]  Jan Ramon,et al.  Expressivity versus efficiency of graph kernels , 2003 .

[13]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[14]  Fabrizio Costa,et al.  Fast Neighborhood Subgraph Pairwise Distance Kernel , 2010, ICML.

[15]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[16]  R. E. Lee,et al.  Distribution-free multiple comparisons between successive treatments , 1995 .

[17]  Neo D. Martinez,et al.  Food-web structure and network theory: The role of connectance and size , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[19]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[20]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[21]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[24]  Jukka-Pekka Onnela,et al.  Feature-Based Classification of Networks , 2016, ArXiv.

[25]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[26]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[27]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[28]  Günther Palm,et al.  Spectral graph features for the classification of graphs and graph sequences , 2014, Comput. Stat..

[29]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[30]  Noga Alon,et al.  lambda1, Isoperimetric inequalities for graphs, and superconcentrators , 1985, J. Comb. Theory, Ser. B.

[31]  Jean-Charles Delvenne,et al.  The stability of a graph partition: A dynamics-based framework for community detection , 2013, ArXiv.

[32]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[33]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[34]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[35]  Mason A. Porter,et al.  Random walks and diffusion on networks , 2016, ArXiv.

[36]  O. Sporns,et al.  Mapping the Structural Core of Human Cerebral Cortex , 2008, PLoS biology.

[37]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[38]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[39]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[40]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Tamás Horváth,et al.  Cyclic Pattern Kernels Revisited , 2005, PAKDD.

[42]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[43]  Frédéric Crevecoeur,et al.  Supervised classification of structural brain networks reveals gender differences , 2018, 2018 19th IEEE Mediterranean Electrotechnical Conference (MELECON).

[44]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[45]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).