Mining web search topics with diverse spatiotemporal patterns

Mining the latent topics from web search data and capturing their spatiotemporal patterns have many applications in information retrieval. As web search is heavily influenced by the spatial and temporal factors, the latent topics usually demonstrate a variety of spatiotemporal patterns. In the face of the diversity of these patterns, existing models are increasingly ineffective, since they capture only one dimension of the spatiotemporal patterns (either the spatial or temporal dimension) or simply assume that there exists only one kind of spatiotemporal patterns. Such oversimplification risks distorting the latent data structure and hindering the downstream usage of the discovered topics. In this paper, we introduce the Spatiotemporal Search Topic Model (SSTM) to discover the latent topics from web search data with capturing their diverse spatiotemporal patterns simultaneously. The SSTM can flexibly support diverse spatiotemporal patterns and seamlessly integrate the unique features in web search such as query words, URLs, timestamps and search sessions. The SSTM is demonstrated as an effective exploratory tool for large-scale web search data and it performs superiorly in quantitative comparisons to several state-of-the-art topic models.

[1]  Jon M. Kleinberg,et al.  Spatial variation in search engine queries , 2008, WWW.

[2]  Di Jiang,et al.  G-WSTD: a framework for geographic web search topic discovery , 2012, CIKM.

[3]  Jiawei Han,et al.  LPTA: A Probabilistic Model for Latent Periodic Topic Analysis , 2011, 2011 IEEE 11th International Conference on Data Mining.

[4]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[5]  Andrew Hogue,et al.  Learning to rank for spatiotemporal search , 2013, WSDM.

[6]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[7]  Ying Li,et al.  Detecting dominant locations from search queries , 2005, SIGIR '05.

[8]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[9]  Di Jiang,et al.  Context-aware search personalization with concept preference , 2011, CIKM '11.

[10]  Changhu Wang,et al.  Equip tourists with knowledge mined from travelogues , 2010, WWW '10.

[11]  Di Jiang,et al.  Beyond Click Graph: Topic Modeling for Search Engine Query Log Analysis , 2013, DASFAA.

[12]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Chong Wang,et al.  Mining geographic knowledge using location aware topic model , 2007, GIR '07.

[14]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[17]  Alexei Pozdnoukhov,et al.  Best Paper Award , 2011 .

[18]  Efthimis N. Efthimiadis,et al.  Analyzing and evaluating query reformulation strategies in web search logs , 2009, CIKM.

[19]  Sergej Sizov,et al.  GeoFolk: latent spatial semantics in web 2.0 social media , 2010, WSDM '10.

[20]  Myra Spiliopoulou,et al.  Topic Evolution in a Stream of Documents , 2009, SDM.