Topic dynamics: an alternative model of bursts in streams of topics

For some time there has been increasing interest in the problem of monitoring the occurrence of topics in a stream of events, such as a stream of news articles. This has led to different models of bursts in these streams, i.e., periods of elevated occurrence of events. Today there are several burst definitions and detection algorithms, and their differences can produce very different results in topic streams. These definitions also share a fundamental problem: they define bursts in terms of an arrival rate. This approach is limiting; other stream dimensions can matter. We reconsider the idea of bursts from the standpoint of a simple kind of physics. Instead of focusing on arrival rates, we reconstruct bursts as a dynamic phenomenon, using kinetics concepts from physics -- mass and velocity -- and derive momentum, acceleration, and force from these. We refer to the result as topic dynamics, permitting a hierarchical, expressive model of bursts as intervals of increasing momentum. As a sample application, we present a topic dynamics model for the large PubMed/MEDLINE database of biomedical publications, using the MeSH (Medical Subject Heading) topic hierarchy. We show our model is able to detect bursts for MeSH terms accurately as well as efficiently.

[1]  Kevin W. Boyack,et al.  Mapping Medline papers, genes, and proteins related to melanoma research , 2004, Proceedings. Eighth International Conference on Information Visualisation, 2004. IV 2004..

[2]  Myra Spiliopoulou,et al.  Expanding the taxonomies of bibliographic archives with persistent long-term themes , 2006, SAC '06.

[3]  Satoshi Morinaga,et al.  Tracking dynamics of topic trends using a finite mixture model , 2004, KDD.

[4]  Taneli Mielikäinen,et al.  Aggregating time partitions , 2006, KDD '06.

[5]  James Allan,et al.  Introduction to topic detection and tracking , 2002 .

[6]  Kevin W. Boyack,et al.  Mapping the backbone of science , 2004, Scientometrics.

[7]  D F Hoth,et al.  Present status and future prospects for HIV therapies. , 1993, Science.

[8]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[9]  Weimao Ke,et al.  Mapping the diffusion of scholarly knowledge among major U.S. research institutions , 2006, Scientometrics.

[10]  Raul Rodriguez-Esteban,et al.  Visualizing evolution and impact of biomedical fields , 2008, J. Biomed. Informatics.

[11]  Xin Zhang,et al.  Better Burst Detection , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[13]  J. Murphy Technical Analysis of the Financial Markets , 1999 .

[14]  K. Börner,et al.  Mapping topics and topic bursts in PNAS , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[16]  James Allan,et al.  Extracting significant time varying features from text , 1999, CIKM '99.

[17]  Weimao Ke,et al.  Mapping Scientific Disciplines and Author Expertise Based on Personal Bibliography Files , 2006, Tenth International Conference on Information Visualisation (IV'06).

[18]  Weimao Ke,et al.  Studying the emerging global brain: Analyzing and visualizing the impact of co-authorship teams , 2005, Complex..