HSN-PAM: Finding Hierarchical Probabilistic Groups from Large-Scale Networks

Real-world social networks are often hierarchical, re- flecting the fact that some communities are composed of a few smaller, sub-communities. This paper describes a hierarchical Bayesian model based scheme, namely HSN- PAM (Hierarchical Social Network-Pachinko Allocation Model), for discovering probabilistic, hierarchical com- munities in social networks. This scheme is powered by a previously developed hierarchical Bayesian model. In this scheme, communities are classified into two categories: super-communities and regular-communities. Two differ- ent network encoding approaches are explored to evaluate this scheme on research collaborative networks, including CiteSeer and NanoSCI. The experimental results demon- strate that HSN-PAM is effective for discovering hierarchi- cal community structures in large-scale social networks.

[1]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[2]  John Yen,et al.  An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks , 2007, 2007 IEEE Intelligence and Security Informatics.

[3]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[4]  Andrew McCallum,et al.  Group and Topic Discovery from Relations and Their Attributes , 2005, NIPS.

[5]  Philip S. Yu,et al.  Online mining of data streams: applications, techniques and progress , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[7]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from non-stationary time series data , 2002, KDD.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[10]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, IEEE/ACM Transactions on Networking.

[11]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[12]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[13]  Zhi Li,et al.  A Unifying Method for Outlier and Change Detection from Data Streams , 2006 .

[14]  John Yen,et al.  Probabilistic Community Discovery Using Hierarchical Latent Gaussian Mixture Model , 2007, AAAI.

[15]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[16]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[17]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.