论文信息 - It's who you know: graph mining using recursive structural features

It's who you know: graph mining using recursive structural features

Given a graph, how can we extract good features for the nodes? For example, given two large graphs from the same domain, how can we use information in one to do classification in the other (i.e., perform across-network classification or transfer learning on graphs)? Also, if one of the graphs is anonymized, how can we use information in one to de-anonymize the other? The key step in all such graph mining tasks is to find effective node features. We propose ReFeX (Recursive Feature eXtraction), a novel algorithm, that recursively combines local (node-based) features with neighborhood (egonet-based) features; and outputs regional features -- capturing "behavioral" information. We demonstrate how these powerful regional features can be used in within-network and across-network classification and de-anonymization tasks -- without relying on homophily, or the availability of class labels. The contributions of our work are as follows: (a) ReFeX is scalable and (b) it is effective, capturing regional ("behavioral") information in large graphs. We report experiments on real graphs from various domains with over 1M edges, where ReFeX outperforms its competitors on typical graph mining tasks like network classification and de-anonymization.

[1] W. Blake. The Fly. , 1876, The Bistoury.

[2] Anne Ophelia Todd Dowden. From flower to fruit , 1984 .

[3] J. Rodriguez,et al. Problem (1) , 1994 .

[4] Albert-László Barabási,et al. Internet: Diameter of the World-Wide Web , 1999, Nature.

[5] Jon M. Kleinberg,et al. The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[6] Michalis Faloutsos,et al. On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[7] Ian Witten,et al. Data Mining , 2000 .

[8] A. Barabasi,et al. Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[9] M E J Newman,et al. Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10] C. Lee Giles,et al. Self-Organization and Identification of Web Communities , 2002, Computer.

[11] Foster Provost,et al. A Simple Relational Classifier , 2003 .

[12] Diane J. Cook,et al. Graph-based anomaly detection , 2003, KDD '03.

[13] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[14] M. Newman. Power laws, Pareto distributions and Zipf's law , 2005 .

[15] Yiming Yang,et al. Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[16] Andrew McCallum,et al. Collective multi-label classification , 2005, CIKM '05.

[17] Christos Faloutsos,et al. Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[18] Chao Liu,et al. Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs , 2005, SDM.

[19] Ian H. Witten,et al. Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[20] Qiang Yang,et al. Boosting for transfer learning , 2007, ICML '07.

[21] Daphne Koller,et al. Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.