A General Framework: Combining Statistical Models and Network Structure for Deep Content Mining

In this paper, we propose a general framework for content mining, which combines statistical model and network structure to leverages the power of both statistical topic models and network method. This method is a novel view to both text oriented method and network oriented method. The proposed framework is general, it can be applied to any text collections with an associated network structure.

[1]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[2]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[3]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[4]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[5]  Christos Faloutsos,et al.  Cascading Behavior in Large Blog Graphs , 2007 .

[6]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[7]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[8]  ChengXiang Zhai,et al.  A mixture model for contextual text mining , 2006, KDD '06.

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Luo Si,et al.  Adjusting Mixture Weights of Gaussian Mixture Model via Regularized Probabilistic Latent Semantic Analysis , 2005, PAKDD.

[11]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[12]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[13]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[16]  Ran El-Yaniv,et al.  Multi-way distributional clustering via pairwise interactions , 2005, ICML.

[17]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[18]  Xiaojin Zhu,et al.  Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning , 2005, ICML.

[19]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[20]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[21]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[22]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Xiang Ji,et al.  Topic evolution and social interactions: how authors effect research , 2006, CIKM '06.

[25]  A. Barabasi,et al.  Network Theory--the Emergence of the Creative Enterprise , 2005, Science.

[26]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[27]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[28]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[29]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[30]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[31]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[32]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[33]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[34]  A. Henney,et al.  A network solution , 2008, Nature.

[35]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.