Where Are We Going? Predicting the Evolution of Individuals

When searching for patterns on data streams, we come across perennial (dynamic) objects that evolve over time. These objects are encountered repeatedly and each time with different definition and values. Examples are (a) companies registered at stock exchange and reporting their progress at the end of each year, and (b) students whose performance is evaluated at the end of each semester. On such data, domain experts also pose questions on how the individual objects will evolve: would it be beneficial to invest in a given company, given both the company's individual performance thus far and the drift experienced in the model? Or, how will a given student perform next year, given the performance variations observed thus far? While there is much research on how models evolve/change over time [Ntoutsi et al., 2011a], little is done to predict the change of individual objects when the states are not known a priori. In this work, we propose a framework that learns the clusters to which the objects belong at each moment, uses them as ad hoc states in a state-transition graph, and then learns a mixture model of Markov Chains, which predicts the next most likely state/cluster per object. We report on our evaluation on synthetic and real datasets.

[1]  Marianne Winslett,et al.  Scientific and Statistical Database Management, 21st International Conference, SSDBM 2009, New Orleans, LA, USA, June 2-4, 2009, Proceedings , 2009, SSDBM.

[2]  Myra Spiliopoulou,et al.  Tree Induction over Perennial Objects , 2010, SSDBM.

[3]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[6]  Saso Dzeroski,et al.  Adaptive Windowing for Online Learning from Multiple Inter-related Data Streams , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[7]  Ryen W. White,et al.  Stream prediction using a generative model based on frequent episodes in event sequences , 2008, KDD.

[8]  Myra Spiliopoulou,et al.  Online Clustering of High-Dimensional Trajectories under Concept Drift , 2011, ECML/PKDD.

[9]  David Taniar,et al.  Computational Science and Its Applications - ICCSA 2011 , 2011, Lecture Notes in Computer Science.

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[12]  Myra Spiliopoulou,et al.  Combining Multiple Interrelated Streams for Incremental Clustering , 2009, SSDBM.

[13]  Myra Spiliopoulou,et al.  Summarizing Cluster Evolution in Dynamic Environments , 2011, ICCSA.

[14]  João Gama,et al.  A framework to monitor clusters evolution applied to economy and finance problems , 2012, Intell. Data Anal..