Two-phase network generation towards within-network classifiers evaluation

Within-network classifiers have been widely used to predict unknown data in networks. In order to evaluate the performance of existing classifiers, it is essential to generate synthetic networks with various properties. However, conventional network generation methods become ineffective under this scenario, since they are unable to produce node labels, exert topological constraints, or provide stable generation performance. In this paper, we propose a novel network generation method for evaluating within-network classifiers, which consists of two generation phases. In the first phase of topology generation, network topology can be obtained by incorporating any existing topology generation models. In the second phase of label generation, we model the problem as a multi-objective optimization. Specifically, we prove that generating node labels over an existing topology conforming homophily constraint is NP-hard, and devise a genetic algorithm based strategy for node label generation. Extensive experiments demonstrate that our method can produce synthetic networks with stable properties, and ensure that the network topology is fixed and label parameters take effect independently, thus making it sufficient for evaluating the sensitivity of classifiers against different parameters.

[1]  Jennifer Neville,et al.  Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning , 2002, ICML.

[2]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Jure Leskovec,et al.  Multiplicative Attribute Graph Model of Real-World Networks , 2010, Internet Math..

[4]  Huan Liu,et al.  Leveraging social media networks for classification , 2011, Data Mining and Knowledge Discovery.

[5]  Mitsuo Gen,et al.  Genetic Algorithms and Their Applications , 2006 .

[6]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[8]  Jennifer Neville,et al.  Attributed graph models: modeling network structure with correlated attributes , 2014, WWW.

[9]  Anastasia A. Lantseva,et al.  Evolutionary simulation of complex networks' structures with specific functional properties , 2017, J. Appl. Log..

[10]  Carlos A. Coello Coello,et al.  Using Clustering Techniques to Improve the Performance of a Multi-objective Particle Swarm Optimizer , 2004, GECCO.

[11]  Linyuan Lu,et al.  SIMILARITY-BASED CLASSIFICATION IN PARTIALLY LABELED NETWORKS , 2010 .

[12]  David W. Aha,et al.  Labels or attributes?: rethinking the neighbors for collective classification in sparsely-labeled networks , 2013, CIKM.

[13]  Kalyan Moy Gupta,et al.  Cautious Collective Classification , 2009, J. Mach. Learn. Res..

[14]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[15]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[16]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[17]  Alexander Bailey,et al.  Genetic Programming for the Automatic Inference of Graph Models for Complex Networks , 2014, IEEE Transactions on Evolutionary Computation.

[18]  Osmar R. Zaïane,et al.  Generating Attributed Networks with Communities , 2015, PloS one.

[19]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[20]  Cheng-Yan Kao,et al.  Applying the genetic approach to simulated annealing in solving some NP-hard problems , 1993, IEEE Trans. Syst. Man Cybern..

[21]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[22]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[24]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[25]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[26]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[27]  B. Bollobás The evolution of random graphs , 1984 .

[28]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[29]  Jennifer Neville,et al.  Using relational knowledge discovery to prevent securities fraud , 2005, KDD '05.

[30]  Juraj Hromkovic,et al.  Algorithmics for hard problems - introduction to combinatorial optimization, randomization, approximation, and heuristics , 2001 .

[31]  Christos Faloutsos,et al.  Using ghost edges for classification in sparsely labeled networks , 2008, KDD.