Topic formation and development: a core-group evolving process

Recent years have witnessed increased interests in topic detection and tracking (TDT). However, existing work mainly focuses on overall trend analysis, and is not developed for understanding the evolving process of topics. To this end, this paper aims to reveal the underlying process and reasons for topic formation and development (TFD). Along this line, based on community partitioning in social networks, a core-group model is proposed to explain the dynamics and to segment topic development. This model is inspired by the cell division mechanism in biology. Furthermore, according to the division phase and interphase in the life cycle of a core group, a topic is separated into four states including birth state, extending state, saturation state and shrinkage state. In this paper, we mainly focus our studies on scientific topic formation and development using the citation network structure among scientific papers. Experimental results on two real-world data sets show that the division of a core group brings on the generation of a new scientific topic. The results also reveal that the progress of an entire scientific topic is closely correlated to the growth of a core group during its interphase. Finally, we demonstrate the effectiveness of the proposed method in several real-life scenarios.

[1]  David Harel,et al.  Clustering spatial data using random walks , 2001, KDD '01.

[2]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[3]  Yiming Yang,et al.  Learning approaches for detecting and tracking news events , 1999, IEEE Intell. Syst..

[4]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[5]  John Scott Social Network Analysis , 1988 .

[6]  B. Djafari-Rouhani,et al.  Band structure and omnidirectional photonic band gap in lamellar structures with left-handed materials. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  E. A. Leicht,et al.  Large-scale structure of time evolving citation networks , 2007, 0706.0015.

[8]  James Allan,et al.  Topic Detection and Tracking , 2002, The Information Retrieval Series.

[9]  Roger Guimerà,et al.  Cartography of complex networks: modules and universal roles , 2005, Journal of statistical mechanics.

[10]  Chen Wang,et al.  Detecting Overlapping Community Structures in Networks , 2009, World Wide Web.

[11]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[12]  R. Virchow,et al.  Die Cellularpathologie in ihrer Begründung auf physiologische und pathologische Gewebelehre , 1861 .

[13]  Molenaar Jc,et al.  [From the library of the Netherlands Journal of Medicine. Rudolf Virchow: Die Cellularpathologie in ihrer Begründung auf physiologische und pathologische Gewebelehre; 1858]. , 2003 .

[14]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[15]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[16]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[17]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[20]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[21]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[22]  Huan Liu,et al.  Community evolution in dynamic multi-mode networks , 2008, KDD.

[23]  Jon M. Kleinberg,et al.  The structure of information pathways in a social communication network , 2008, KDD.

[24]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[26]  Michal Jacovi,et al.  The chasms of CSCW: a citation graph analysis of the CSCW conference , 2006, CSCW '06.

[27]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[28]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[29]  Yue Lu,et al.  Opinion integration through semi-supervised topic modeling , 2008, WWW.

[30]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[31]  R. Guimerà,et al.  Classes of complex networks defined by role-to-role connectivity profiles. , 2007, Nature physics.

[32]  Ji-Rong Wen,et al.  Scalable community discovery on textual data with relations , 2008, CIKM '08.

[33]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[34]  Dmitriy Fradkin,et al.  Anticipating annotations and emerging trends in biomedical literature , 2008, KDD.

[35]  Tanya Y. Berger-Wolf,et al.  A framework for community identification in dynamic social networks , 2007, KDD '07.

[36]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Xiang Ji,et al.  Topic evolution and social interactions: how authors effect research , 2006, CIKM '06.

[38]  Kathleen M. Carley,et al.  Toward an interoperable dynamic network analysis toolkit , 2007, Decis. Support Syst..

[39]  Bo Zhao,et al.  Community evolution detection in dynamic heterogeneous information networks , 2010, MLG '10.

[40]  Ichiro Sakata,et al.  Detecting emerging research fronts in regenerative medicine by the citation network analysis of scientific publications , 2011 .

[41]  Tanya Y. Berger-Wolf,et al.  A framework for analysis of dynamic social networks , 2006, KDD '06.

[42]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[43]  J. Kumpula,et al.  Sequential algorithm for fast clique percolation. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  E. Rogers,et al.  Diffusion of innovations , 1964, Encyclopedia of Sport Management.

[45]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[46]  R. Carter 11 – IT and society , 1991 .

[47]  James Allan,et al.  Introduction to topic detection and tracking , 2002 .

[48]  Hongyuan Zha,et al.  Discovering Temporal Communities from Social Network Documents , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).