Non-Linear Mining of Competing Local Activities

Given a large collection of time-evolving activities, such as Google search queries, which consist of d keywords/activities for m locations of duration n, how can we analyze temporal patterns and relationships among all these activities and find location-specific trends? How do we go about capturing non-linear evolutions of local activities and forecasting future patterns? For example, assume that we have the online search volume for multiple keywords, e.g., "Nokia/Nexus/Kindle" or "CNN/BBC" for 236 countries/territories, from 2004 to 2015. Our goal is to analyze a large collection of multi-evolving activities, and specifically, to answer the following questions: (a) Is there any sign of interaction/competition between two different keywords? If so, who competes with whom? (b) In which country is the competition strong? (c) Are there any seasonal/annual activities? (d) How can we automatically detect important world-wide (or local) events? We present COMPCUBE, a unifying non-linear model, which provides a compact and powerful representation of co-evolving activities; and also a novel fitting algorithm, COMPCUBE-FIT, which is parameter-free and scalable. Our method captures the following important patterns: (B)asic trends, i.e., non-linear dynamics of co-evolving activities, signs of (C)ompetition and latent interaction, e.g., Nokia vs. Nexus, (S)easonality, e.g., a Christmas spike for iPod in the U.S. and Europe, and (D)eltas, e.g., unrepeated local events such as the U.S. election in 2008. Thanks to its concise but effective summarization, COMPCUBE can also forecast long-range future activities. Extensive experiments on real datasets demonstrate that COMPCUBE consistently outperforms the best state-of- the-art methods in terms of both accuracy and execution speed.

[1]  Bruno Ribeiro,et al.  Modeling and predicting the growth and death of membership-based websites , 2013, WWW.

[2]  Sakurai Yasushi,et al.  Mining and Forecasting of Big Time-Series Data , 2015, 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[3]  Christos Faloutsos,et al.  FTW: fast similarity search under the time warping distance , 2005, PODS.

[4]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[5]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[6]  Kristina Lerman,et al.  Tripartite graph clustering for dynamic sentiment analysis on social media , 2014, SIGMOD Conference.

[7]  Ramanathan V. Guha,et al.  The predictive power of online chatter , 2005, KDD '05.

[8]  Christos Faloutsos,et al.  Stream Monitoring under the Time Warping Distance , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[10]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[11]  Christos Faloutsos,et al.  Adaptive, Hands-Off Stream Mining , 2003, VLDB.

[12]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[13]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[14]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[15]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[16]  Christos Faloutsos,et al.  AutoPlait: automatic mining of co-evolving time sequences , 2014, SIGMOD Conference.

[17]  Bruno Ribeiro,et al.  Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries , 2014, ECML/PKDD.

[18]  R. May Qualitative Stability in Model Ecosystems , 1973 .

[19]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[20]  Dimitris Papadias,et al.  Real-Time Multi-Criteria Social Graph Partitioning: A Game Theoretic Approach , 2015, SIGMOD Conference.

[21]  Ravi Kumar,et al.  Dynamics of conversations , 2010, KDD.

[22]  Christos Faloutsos,et al.  Fast mining and forecasting of complex time-stamped events , 2012, KDD.

[23]  Dimitrios Gunopulos,et al.  Streaming Time Series Summarization Using User-Defined Amnesic Functions , 2008, IEEE Transactions on Knowledge and Data Engineering.

[24]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[25]  Rob J Hyndman,et al.  Forecasting Time Series With Complex Seasonal Patterns Using Exponential Smoothing , 2011 .

[26]  Christos Faloutsos,et al.  Winner takes all: competing viruses or ideas on fair-play networks , 2012, WWW.

[27]  Christos Faloutsos,et al.  FUNNEL: automatic mining of spatially coevolving epidemics , 2014, KDD.

[28]  Eamonn J. Keogh,et al.  Online discovery and maintenance of time series motifs , 2010, KDD.

[29]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[30]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[31]  Philip S. Yu,et al.  Optimal multi-scale patterns in time series streams , 2006, SIGMOD Conference.

[32]  Christos Faloutsos,et al.  Rise and fall patterns of information diffusion: model and implications , 2012, KDD.

[33]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[34]  Christos Faloutsos,et al.  The Web as a Jungle: Non-Linear Dynamical Systems for Co-evolving Online Activities , 2015, WWW.

[35]  Christian Böhm,et al.  RIC: Parameter-free noise-robust clustering , 2007, TKDD.

[36]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[37]  H. Stanley,et al.  Quantifying Trading Behavior in Financial Markets Using Google Trends , 2013, Scientific Reports.

[38]  H. Varian,et al.  Predicting the Present with Google Trends , 2012 .

[39]  Nick Koudas,et al.  Early online identification of attention gathering items in social media , 2010, WSDM '10.

[40]  Christos Faloutsos,et al.  Parsimonious linear fingerprinting for time series , 2010, Proc. VLDB Endow..

[41]  Haixun Wang,et al.  Finding semantics in time series , 2011, SIGMOD '11.