Increasing the Predictive Power of Affiliation Networks

Scale is often an issue when attempting to understand and analyze large social networks. As the size of the network increases, it is harder to make sense of the network, and it is computationally costly to manipulate and maintain. Here we investigate methods for pruning social networks by determining the most relevant relationships in a social network. We measure importance in terms of predictive accuracy on a set of target attributes of social network groups. Our goal is to create a pruned network that models the most informative affiliations and relationships. We present methods for pruning networks based on both structural properties and descriptive attributes. These pruning approaches can be used to decrease the expense of constructing social networks for analysis by reducing the number of relationships that need to be investigated and as a data reduction approach for approximating larger graphs or visualizing large graphs. We demonstrate our method on a network of NASDAQ and NYSE business executives and on a bibliographic network describing publications and authors and show that structural and descriptive pruning increase the predictive power of affiliation networks when compared to random pruning.

[1]  Ramakrishnan Srikant,et al.  Mining newsgroups using networks arising from social behavior , 2003, WWW '03.

[2]  Arno J. Knobbe,et al.  Propositionalisation and Aggregates , 2001, PKDD.

[3]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[4]  Lawrence B. Holder,et al.  Subdue: compression-based frequent pattern discovery in graph data , 2005 .

[5]  Michael F. Schwartz,et al.  Discovering shared interests using graph analysis , 1993, CACM.

[6]  Lisa Singh,et al.  Pruning social networks using structural properties and descriptive attributes , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[8]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[9]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[10]  Connolly,et al.  Database Systems , 2004 .

[11]  M. Blasgen Database Systems , 1982, Science.

[12]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[13]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[14]  Foster J. Provost,et al.  Aggregation-based feature invention and relational concept classes , 2003, KDD '03.

[15]  Narsingh Deo,et al.  A Structural Approach to Graph Compression , 1998 .

[16]  Christos Faloutsos,et al.  ANF: a fast and scalable tool for data mining in massive graphs , 2002, KDD.

[17]  R. Pastor-Satorras,et al.  Epidemic spreading in correlated complex networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  David Jensen,et al.  Data Mining in Social Networks , 2002 .

[19]  Edoardo M. Airoldi,et al.  Sampling algorithms for pure network topologies: a study on the stability and the separability of metric embeddings , 2005, SKDD.