The Web as a Jungle: Non-Linear Dynamical Systems for Co-evolving Online Activities

Given a large collection of co-evolving online activities, such as searches for the keywords "Xbox", "PlayStation" and "Wii", how can we find patterns and rules? Are these keywords related? If so, are they competing against each other? Can we forecast the volume of user activity for the coming month? We conjecture that online activities compete for user attention in the same way that species in an ecosystem compete for food. We present ECOWEB, (i.e., Ecosystem on the Web), which is an intuitive model designed as a non-linear dynamical system for mining large-scale co-evolving online activities. Our second contribution is a novel, parameter-free, and scalable fitting algorithm, ECOWEB-FIT, that estimates the parameters of ECOWEB. Extensive experiments on real data show that ECOWEB is effective, in that it can capture long-range dynamics and meaningful patterns such as seasonalities, and practical, in that it can provide accurate long-range forecasts. ECOWEB consistently outperforms existing methods in terms of both accuracy and execution speed.

[1]  Philip S. Yu,et al.  Anatomy of a web-scale resale market: a data mining approach , 2013, WWW '13.

[2]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[3]  Christos Faloutsos,et al.  FUNNEL: automatic mining of spatially coevolving epidemics , 2014, KDD.

[4]  Christos Faloutsos,et al.  Adaptive, Hands-Off Stream Mining , 2003, VLDB.

[5]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[6]  Christos Faloutsos,et al.  Prediction and indexing of moving objects with unknown motion patterns , 2004, SIGMOD '04.

[7]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[8]  Christos Faloutsos,et al.  AutoPlait: automatic mining of co-evolving time sequences , 2014, SIGMOD Conference.

[9]  Philip S. Yu,et al.  Suppressing model overfitting in mining concept-drifting data streams , 2006, KDD '06.

[10]  Edward Y. Chang,et al.  Adaptive stream resource management using Kalman Filters , 2004, SIGMOD '04.

[11]  Ian Davidson,et al.  Network discovery via constrained tensor analysis of fMRI data , 2013, KDD.

[12]  R. Axelrod,et al.  Evolutionary Dynamics , 2004 .

[13]  Christos Faloutsos,et al.  Rise and fall patterns of information diffusion: model and implications , 2012, KDD.

[14]  Ravi Kumar,et al.  Dynamics of conversations , 2010, KDD.

[15]  MAGDALINI EIRINAKI,et al.  Web mining for web personalization , 2003, TOIT.

[16]  Christos Faloutsos,et al.  Winner takes all: competing viruses or ideas on fair-play networks , 2012, WWW.

[17]  Christos Faloutsos,et al.  Interacting viruses in networks: can both survive? , 2012, KDD.

[18]  R. May Qualitative Stability in Model Ecosystems , 1973 .

[19]  H. Varian,et al.  Predicting the Present with Google Trends , 2012 .

[20]  Yue Lu,et al.  Exploiting social context for review quality prediction , 2010, WWW '10.

[21]  Michalis Faloutsos,et al.  Threshold conditions for arbitrary cascade models on arbitrary networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[22]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[23]  Yehuda Koren,et al.  Care to comment?: recommendations for commenting on news stories , 2012, WWW.

[24]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[25]  H. Stanley,et al.  Quantifying Trading Behavior in Financial Markets Using Google Trends , 2013, Scientific Reports.

[26]  Christos Faloutsos,et al.  Fully automatic cross-associations , 2004, KDD.

[27]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[28]  Christos Faloutsos,et al.  Fast mining and forecasting of complex time-stamped events , 2012, KDD.

[29]  Christos Faloutsos,et al.  Parsimonious linear fingerprinting for time series , 2010, Proc. VLDB Endow..

[30]  Christian Böhm,et al.  Outlier-robust clustering using independent components , 2008, SIGMOD Conference.

[31]  Bruno Ribeiro,et al.  Modeling and predicting the growth and death of membership-based websites , 2013, WWW.

[32]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[33]  Ramanathan V. Guha,et al.  The predictive power of online chatter , 2005, KDD '05.

[34]  Charu C. Aggarwal,et al.  The setwise stream classification problem , 2014, KDD.

[35]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[36]  P. Kaye Infectious diseases of humans: Dynamics and control , 1993 .

[37]  Christos Faloutsos,et al.  Monitoring Network Evolution using MDL , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[38]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[39]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[40]  Jure Leskovec,et al.  Finding progression stages in time-evolving event sequences , 2014, WWW.

[41]  Philip S. Yu,et al.  Optimal multi-scale patterns in time series streams , 2006, SIGMOD Conference.

[42]  A. Galston Fundamentals of Ecology , 1972, The Yale Journal of Biology and Medicine.

[43]  Bruno Ribeiro,et al.  Revisit Behavior in Social Media: The Phoenix-R Model and Discoveries , 2014, ECML/PKDD.

[44]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[45]  W. Leontief Input-output economics , 1967 .

[46]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[47]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[48]  Christian Böhm,et al.  RIC: Parameter-free noise-robust clustering , 2007, TKDD.

[49]  Christos Faloutsos,et al.  BRAID: stream mining through group lag correlations , 2005, SIGMOD '05.

[50]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[51]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[52]  Haixun Wang,et al.  Finding semantics in time series , 2011, SIGMOD '11.

[53]  F. Brauer,et al.  Mathematical Models in Population Biology and Epidemiology , 2001 .

[54]  Srinivasan Parthasarathy,et al.  Economically-efficient sentiment stream analysis , 2014, SIGIR.

[55]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[56]  E. A. Jackson,et al.  Perspectives of nonlinear dynamics , 1990 .

[57]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.