EndNote: Feature-based classification of networks

Network representations of systems from various scientific and societal domains are neither completely random nor fully regular, but instead appear to contain recurring structural features. These features tend to be shared by networks belonging to the same broad class, such as the class of social networks or the class of biological networks. Within each such class, networks describing similar systems tend to have similar features. This occurs presumably because networks representing similar systems would be expected to be generated by a shared set of domain specific mechanisms, and it should therefore be possible to classify networks based on their features at various structural levels. Here we describe and demonstrate a new hybrid approach that combines manual selection of network features of potential interest with existing automated classification methods. In particular, selecting well-known network features that have been studied extensively in social network analysis and network science literature, and then classifying networks on the basis of these features using methods such as random forest, which is known to handle the type of feature collinearity that arises in this setting, we find that our approach is able to achieve both higher accuracy and greater interpretability in shorter computation time than other methods.

[1]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[2]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[3]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[4]  Dimitri Van De Ville,et al.  Classifying Connectivity Graphs Using Graph and Vertex Attributes , 2011, 2011 International Workshop on Pattern Recognition in NeuroImaging.

[5]  Jukka-Pekka Onnela,et al.  Taxonomies of networks from community structure. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Karsten M. Borgwardt,et al.  The graphlet spectrum , 2009, ICML '09.

[7]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[8]  Jan Ramon,et al.  Expressivity versus efficiency of graph kernels , 2003 .

[9]  Thomas Gärtner,et al.  Cyclic pattern kernels for predictive graph mining , 2004, KDD.

[10]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[11]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[12]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[13]  Philip S. Yu,et al.  Discriminative frequent subgraph mining with optimality guarantees , 2010 .

[14]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[15]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[16]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.