A framework to monitor clusters evolution applied to economy and finance problems

The study of evolution has become an important research issue, especially in the last decade, due to our ability to collect and store high detailed and time-stamped data. The need for describing and understanding the behavior of a given phenomena over time led to the emergence of new frameworks and methods focused on the temporal evolution of data and models. In this paper we address the problem of monitoring the evolution of clusters over time and propose the MEC framework. MEC traces evolution through the detection and categorization of clusters transitions, such as births, deaths and merges, and enables their visualization through bipartite graphs. It includes a taxonomy of transitions, a tracking method based in the computation of conditional probabilities, and a transition detection algorithm. We use MEC with two main goals: to determine the general evolution trends and to detect abnormal behavior or rare events. To demonstrate the applicability of our framework we present real world economic and financial case studies, using datasets extracted from Banco de Portugal Central Balance-Sheet Database and the The Data Page of New York University --Leonard N. Stern School of Business. The results allow us to draw interesting conclusions about the evolution of activity sectors and European companies.

[1]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[2]  Keke Chen,et al.  Detecting the Change of Clustering Structure in Categorical Data Streams , 2006, SDM.

[3]  Giovanni Urga The Econometrics of Panel Data: A Selective Introduction , 1992 .

[4]  Tao Li,et al.  Entropy-based criterion in categorical clustering , 2004, ICML.

[5]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams , 2007, SAC '07.

[6]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[7]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[8]  Haiyan Qiao,et al.  A Data Clustering Tool with Cluster Validity Indices , 2009, 2009 International Conference on Computing, Engineering and Information.

[9]  Yannis Theodoridis,et al.  The Panda framework for Comparing Patterns , 2009, Data Knowl. Eng..

[10]  Ricardo J. G. B. Campello,et al.  Evolving clusters in gene-expression data , 2006, Inf. Sci..

[11]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[12]  Slobodan Petrovic,et al.  A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters , 2006 .

[13]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[14]  Myra Spiliopoulou,et al.  Tracing cluster transitions for different cluster types , 2009, Control. Cybern..

[15]  Myra Spiliopoulou,et al.  On exploiting the power of time in data mining , 2008, SKDD.

[16]  Srinivasan Parthasarathy,et al.  A generalized framework for mining spatio-temporal patterns in scientific data , 2005, KDD '05.

[17]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.

[18]  Myra Spiliopoulou,et al.  Monitoring the Evolution of Web Usage Patterns , 2003, EWMF.

[19]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[20]  Johannes Gehrke,et al.  A framework for measuring changes in data characteristics , 1999, PODS '99.

[21]  Francisco Azuaje,et al.  Cluster validation techniques for genome expression data , 2003, Signal Process..

[22]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[23]  Myra Spiliopoulou,et al.  Mining and Visualizing the Evolution of Subgroups in Social Networks , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[24]  Charu C. Aggarwal,et al.  On change diagnosis in evolving data streams , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25]  Panos Kalnis,et al.  On Discovering Moving Clusters in Spatio-temporal Data , 2005, SSTD.

[26]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Quality indices for (practical) clustering evaluation , 2009, Intell. Data Anal..

[27]  Ahmed Albatineh,et al.  On Similarity Indices and Correction for Chance Agreement , 2006, J. Classif..

[28]  João Gama,et al.  Bipartite Graphs for Monitoring Clusters Transitions , 2010, IDA.

[29]  Srinivasan Parthasarathy,et al.  An event-based framework for characterizing the evolutionary behavior of interaction graphs , 2007, KDD '07.

[30]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[31]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[32]  Myra Spiliopoulou,et al.  Monitoring Change in Mining Results , 2001, DaWaK.