Torpedo: topic periodicity discovery from text data

Although history may not repeat itself, many human activities are inherently periodic, recurring daily, weekly, monthly, yearly or following some other periods. Such recurring activities may not repeat the same set of keywords, but they do share similar topics. Thus it is interesting to mine topic periodicity from text data instead of just looking at the temporal behavior of a single keyword/phrase. Some previous preliminary studies in this direction prespecify a periodic temporal template for each topic. In this paper, we remove this restriction and propose a simple yet effective framework Torpedo to mine periodic/recurrent patterns from text, such as news articles, search query logs, research papers, and web blogs. We first transform text data into topic-specific time series by a time dependent topic modeling module, where each of the time series characterizes the temporal behavior of a topic. Then we use time series techniques to detect periodicity. Hence we both obtain a clear view of how topics distribute over time and enable the automatic discovery of periods that are inherent in each topic. Theoretical and experimental analyses demonstrate the advantage of Torpedo over existing work.

[1]  Philip S. Yu,et al.  Structural Periodic Measures for Time-Series Data , 2005, Data Mining and Knowledge Discovery.

[2]  Yasushi Sakurai,et al.  Online multiscale dynamic topic models , 2010, KDD.

[3]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[4]  Jie Chen,et al.  Bioinformatics Original Paper Detecting Periodic Patterns in Unevenly Spaced Gene Expression Time Series Using Lomb–scargle Periodograms , 2022 .

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Mohammed Al-Shalalfa,et al.  Efficient Periodicity Mining in Time Series Databases Using Suffix Trees , 2011, IEEE Transactions on Knowledge and Data Engineering.

[7]  Philip S. Yu,et al.  Mining Asynchronous Periodic Patterns in Time Series Data , 2003, IEEE Trans. Knowl. Data Eng..

[8]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[9]  Walid G. Aref,et al.  WARP: time warping for periodicity detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Pavlos Protopapas,et al.  Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures , 2008, The VLDB Journal.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  John D. Lafferty,et al.  A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2017, SIGF.

[13]  Anthony L Bertapelle Spectral Analysis of Time Series. , 1979 .

[14]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[15]  Trevor Cohn,et al.  A temporal model of text periodicities using Gaussian Processes , 2013, EMNLP.

[16]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[17]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[18]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[19]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[20]  Liangjie Hong,et al.  A time-dependent topic model for multiple text streams , 2011, KDD.

[21]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[22]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[23]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[24]  Walid G. Aref,et al.  Periodicity detection in time series databases , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25]  Paul Goodwin,et al.  The Holt-Winters Approach to Exponential Smoothing: 50 Years Old and Going Strong , 2010 .

[26]  Reda Alhajj,et al.  Periodic pattern analysis of non-uniformly sampled stock market data , 2012, Intell. Data Anal..

[27]  Kazunari Ishida Periodic Topic Mining from Massive Amounts of Data , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.

[28]  Philip S. Yu,et al.  On Periodicity Detection and Structural Periodic Similarity , 2005, SDM.

[29]  Carl Lagoze,et al.  The web of topics: discovering the topology of topic evolution in a corpus , 2011, WWW.

[30]  Jiawei Han,et al.  LPTA: A Probabilistic Model for Latent Periodic Topic Analysis , 2011, 2011 IEEE 11th International Conference on Data Mining.

[31]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[32]  Milad Shokouhi,et al.  Detecting seasonal queries by time-series analysis , 2011, SIGIR.