It's who you know: graph mining using recursive structural features

Given a graph, how can we extract good features for the nodes? For example, given two large graphs from the same domain, how can we use information in one to do classification in the other (i.e., perform across-network classification or transfer learning on graphs)? Also, if one of the graphs is anonymized, how can we use information in one to de-anonymize the other? The key step in all such graph mining tasks is to find effective node features. We propose ReFeX (Recursive Feature eXtraction), a novel algorithm, that recursively combines local (node-based) features with neighborhood (egonet-based) features; and outputs regional features -- capturing "behavioral" information. We demonstrate how these powerful regional features can be used in within-network and across-network classification and de-anonymization tasks -- without relying on homophily, or the availability of class labels. The contributions of our work are as follows: (a) ReFeX is scalable and (b) it is effective, capturing regional ("behavioral") information in large graphs. We report experiments on real graphs from various domains with over 1M edges, where ReFeX outperforms its competitors on typical graph mining tasks like network classification and de-anonymization.

[1]  W. Blake The Fly. , 1876, The Bistoury.

[2]  Anne Ophelia Todd Dowden From flower to fruit , 1984 .

[3]  J. Rodriguez,et al.  Problem (1) , 1994 .

[4]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[5]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[6]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[9]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[11]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[12]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[13]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[14]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[15]  Yiming Yang,et al.  Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[16]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[17]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[18]  Chao Liu,et al.  Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs , 2005, SDM.

[19]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[20]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[21]  Daphne Koller,et al.  Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.

[22]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[23]  Tina Eliassi-Rad,et al.  Leveraging Label-Independent Features for Classification in Sparsely Labeled Networks: An Empirical Study , 2008, SNAKDD.

[24]  Christos Faloutsos,et al.  Using ghost edges for classification in sparsely labeled networks , 2008, KDD.

[25]  Hongliang Fei,et al.  Structure feature selection for graph classification , 2008, CIKM '08.

[26]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[27]  Jingrui He,et al.  Graph-based transfer learning , 2009, CIKM.

[28]  Philip S. Yu,et al.  Multi-label Feature Selection for Graph Classification , 2010, 2010 IEEE International Conference on Data Mining.

[29]  Christos Faloutsos,et al.  On the Vulnerability of Large Graphs , 2010, 2010 IEEE International Conference on Data Mining.

[30]  Andrew McCallum,et al.  Collective Cross-Document Relation Extraction Without Labelled Data , 2010, EMNLP.

[31]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[32]  Christos Faloutsos,et al.  Metric forensics: a multi-level approach for mining volatile graphs , 2010, KDD.

[33]  G. M. Warren Sewage and sewerage of farm homes , 2010 .

[34]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[35]  Jian Hu,et al.  Cross lingual text classification by mining multilingual topics from wikipedia , 2011, WSDM '11.