Fast Identification of Topic Burst Patterns Based on Temporal Clustering

Temporal text mining is widely used in summarization and tracking of evolutionary topic trends. In online collaborative systems like Wikipedia, edit history of each article is stored as revisions. Topics of articles or categories grow and fade over time and retain evolutionary information in edit history. This paper studies a particular temporal text mining task: quickly finding burst patterns of topics from phrases extracted from edit history of Wikipedia articles. We first extract several candidate phrases from edit history by specific features and build time series with edit frequency. Temporal clustering of burst patterns of phrases reveals bursts of topics. However, distance measure for temporal clustering, such as dynamic time warping (DTW), is often costly. In this paper, we propose segmented DTW which decomposes time series into proper segments and computes DTW distance within segments separately. Our segmented DTW shows reasonable speed up over DTW, while the proposed method can identify interesting evolutionary topic burst patterns effectively. Research so far can be applied in domains like trend tracking, temporal relatedness of phrases and popular topic discovery.

[1]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[2]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[3]  Mizuho Iwaihara,et al.  Identifying Evolutionary Topic Temporal Patterns Based on Bursty Phrase Clustering , 2017, APWeb/WAIM.

[4]  Max Mühlhäuser,et al.  Analyzing and accessing Wikipedia as a lexical semantic resource , 2007 .

[5]  Evgeniy Gabrilovich,et al.  Using the past to score the present: extending term weighting models through revision history analysis , 2010, CIKM.

[6]  Javid Taheri,et al.  SparseDTW: A Novel Approach to Speed up Dynamic Time Warping , 2009, AusDM.

[7]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[8]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[9]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[10]  Jaehong Kim,et al.  Dynamic Time Warping-Based K-Means Clustering for Accelerometer-Based Handwriting Recognition , 2011 .

[11]  Lars Schmidt-Thieme,et al.  Towards real-time collaborative filtering for big fast data , 2013, WWW.

[12]  Claude Sammut,et al.  Variance-wise Segmentation for a Temporal-Adaptive SAX , 2012, AusDM.

[13]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[14]  Hamzah Arof,et al.  On improving Dynamic Time Warping for pattern matching , 2012 .