Classification in Networked Data: a Toolkit and a Univariate Case Study

This paper is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well---well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes---that is, Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection.

[1]  Hsinchun Chen,et al.  Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering , 2004, TOIS.

[2]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[3]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[4]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[6]  David Jensen,et al.  Data Mining in Social Networks , 2002 .

[7]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[8]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[9]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[10]  V. Vapnik The Support Vector Method of Function Estimation , 1998 .

[11]  P. Cohen,et al.  Is Guilt by Association a Bad Thing ? , 2005 .

[12]  Azriel Rosenfeld,et al.  Scene Labeling by Relaxation Operations , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[14]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..

[15]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[16]  Abraham Bernstein,et al.  Discovering Knowledge from Relational Data Extracted from Business News , 2002 .

[17]  P. L. Dobruschin The Description of a Random Field by Means of Conditional Probabilities and Conditions of Its Regularity , 1968 .

[18]  C. Loomis Political and occupational cleavages in a Hanoverian village, Germany : a sociometric study , 1946 .

[19]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[20]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[21]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[22]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[23]  Foster Provost,et al.  Suspicion scoring based on guilt-by-association, colle ctive inference, and focused data access 1 , 2005 .

[24]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[25]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[26]  Jennifer Neville,et al.  Leveraging relational autocorrelation with latent group models , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[27]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[28]  Daphne Koller,et al.  Genome-wide discovery of transcriptional modules from DNA sequence and gene expression , 2003, ISMB.

[29]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[30]  Chris Volinsky,et al.  Building an Effective Representation for Dynamic Networks , 2005 .

[31]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[32]  P. Lazarsfeld,et al.  Friendship as Social process: a substantive and methodological analysis , 1964 .

[33]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[34]  Foster J. Provost,et al.  Distribution-based aggregation for relational learning with identifier attributes , 2006, Machine Learning.

[35]  Foster Provost,et al.  Acora: Distribution-Based Aggregation for Relational Learning from Identifier Attributes , 2005 .

[36]  Dietrich Wettschereck,et al.  Relational Instance-Based Learning , 1996, ICML.

[37]  Peter A. Flach,et al.  Propositionalization approaches to relational data mining , 2001 .

[38]  Jennifer Neville,et al.  Collective Classification with Relational Dependency Networks , 2003 .

[39]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[40]  Gerhard Winkler,et al.  Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: A Mathematical Introduction , 2002 .

[41]  L. D. Raedt,et al.  Three companions for data mining in first order logic , 2001 .

[42]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[43]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 2002, JACM.

[44]  Martin S. Kochmanski NOTE ON THE E. ISING'S PAPER ,,BEITRAG ZUR THEORIE DES FERROMAGNETISMUS" (Zs. Physik, 31, 253 (1925)) , 2008 .

[45]  Jennifer Neville,et al.  Learning relational probability trees , 2003, KDD '03.

[46]  Foster Provost,et al.  Relational Learning Problems and Simple Models , 2003 .

[47]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[49]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[50]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[51]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[52]  Jennifer Neville,et al.  Why collective inference improves relational classification , 2004, KDD.

[53]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[54]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[56]  H. Richardson Community of Values as a Factor in Friendships of College and Adult Women , 1940 .

[57]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[58]  R. B. Potts Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.

[59]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[60]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[61]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[62]  Steven W. Zucker,et al.  On the Foundations of Relaxation Labeling Processes , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[64]  Joan Jeffri,et al.  JAZZ NETWORKS: USING RESPONDENT-DRIVEN SAMPLING TO STUDY STRATIFICATION IN TWO JAZZ MUSICIANS COMMUNITIES* , 2003 .

[65]  Foster J. Provost,et al.  Aggregation-based feature invention and relational concept classes , 2003, KDD '03.

[66]  Foster Provost,et al.  Suspicion scoring of networked entities based on guilt-by-association, collective inference, and focused data access 1 , 2005 .

[67]  Jennifer Neville,et al.  Dependency networks for relational data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[68]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[69]  Chris Volinsky,et al.  Network-Based Marketing: Identifying Likely Adopters Via Consumer Networks , 2006, math/0606278.

[70]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[71]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[72]  P. Blau Inequality and Heterogeneity: A Primitive Theory of Social Structure , 1978 .

[73]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[74]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  D. Whitteridge,et al.  Learning and Relearning , 1959, Science's STKE.

[76]  Ben Taskar,et al.  Markov Logic: A Unifying Framework for Statistical Relational Learning , 2007 .

[77]  Lyle H. Ungar,et al.  Statistical Relational Learning for Link Prediction , 2003 .

[78]  Peter A. Flach,et al.  Naive Bayesian Classification of Structured Data , 2004, Machine Learning.

[79]  Jennifer Neville,et al.  Using relational knowledge discovery to prevent securities fraud , 2005, KDD '05.

[80]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.

[81]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[82]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[83]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[84]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[85]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[86]  Corinna Cortes,et al.  Communities of interest , 2001, Intell. Data Anal..

[87]  Jennifer Neville,et al.  Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning , 2002, ICML.

[88]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[89]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[90]  Peter A. Flach,et al.  IBC: A First-Order Bayesian Classifier , 1999, ILP.

[91]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .