Clustering of Bi-Dimensional and Heterogeneous Time Series: Application to Social Sciences Data

We present an application of bi-dimensional and heterogeneous time series clustering in order to resolve a Social Sciences study issue. The dataset is the result of a survey involving more than eight thousand handicapped persons. Sociologists need to know if there are in this dataset some homogeneous classes of people according to two attributes: the idea that handicapped people have about the quality of their life and their couple status (i.e. if they have a partner or not). These two attributes are time series so we had to adapt the k-Means clustering algorithm in order to be efficient with this kind of data. For this purpose, we use the Longest Common Subsequence time series distance for its efficiency to manage time stretching and we extend it to the bidimensional and heterogeneous case. The results of our unsupervised process give some pertinent and surprising clusters that can be easily analyzed by sociologists.

[1]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[2]  Dimitrios Gunopulos,et al.  A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series , 2003 .

[3]  Nasser Yazdani,et al.  Matching and indexing sequences of different lengths , 1997, CIKM '97.

[4]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[5]  Cyrus Shahabi,et al.  A PCA-based similarity measure for multivariate time series , 2004, MMDB '04.

[6]  Panu Somervuo,et al.  Self-Organizing Maps and Learning Vector Quantization for Feature Sequences , 1999, Neural Processing Letters.

[7]  Dimitrios Gunopulos,et al.  On indexing mobile objects , 1999, PODS '99.

[8]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[9]  Changzhou Wang,et al.  Supporting fast search in time series for movement patterns in multiple scales , 1998, CIKM '98.

[10]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[11]  Edwin Diday,et al.  The dynamic clusters method in nonhierarchical clustering , 1973, International Journal of Computer & Information Sciences.

[12]  W. Krzanowski Between-Groups Comparison of Principal Components , 1979 .

[13]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[14]  Donald J. Berndt,et al.  Finding Patterns in Time Series: A Dynamic Programming Approach , 1996, Advances in Knowledge Discovery and Data Mining.

[15]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[16]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[17]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[18]  Dimitrios Gunopulos,et al.  Robust similarity measures for mobile object trajectories , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[19]  R. Bellman Dynamic programming. , 1957, Science.

[20]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[21]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[22]  Fu-Ren Lin,et al.  Learning Clinical Pathway Patterns by Hidden Markov Model , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[23]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[24]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[26]  Dale E. Seborg,et al.  Clustering of multivariate time-series data , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[27]  Laura Firoiu,et al.  Clustering Time Series with Hidden Markov Models and Dynamic Time Warping , 1999 .

[28]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[29]  Risto Miikkulainen,et al.  SARDNET: A Self-Organizing Feature Map for Sequences , 1994, NIPS.

[30]  Dimitrios Gunopulos,et al.  Time-series similarity problems and well-separated geometric sets , 1997, SCG '97.

[31]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[32]  Cláudia Antunes,et al.  Temporal Data Mining: an overview , 2001 .

[33]  Jeremy Buhler,et al.  Finding motifs using random projections , 2001, RECOMB.

[34]  Mehmet A. Orgun,et al.  An Overview Of Temporal Data Mining , 2002, AusDM.

[35]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[36]  Nicolas Nicoloyannis,et al.  Apprentissage non supervisé de séries temporelles à l'aide des k-means et d'une nouvelle méthode d'agrégation de séries , 2005, EGC.

[37]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[38]  Dimitrios Gunopulos,et al.  Iterative Incremental Clustering of Time Series , 2004, EDBT.

[39]  Christian S. Jensen,et al.  Indexing the positions of continuously moving objects , 2000, SIGMOD '00.

[40]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[41]  Mohammed J. Zaki,et al.  Scalable Feature Mining for Sequential Data , 2000, IEEE Intell. Syst..

[42]  Daniel Kudenko,et al.  Feature Generation for Sequence Categorization , 1998, AAAI/IAAI.

[43]  Deok-Hwan Kim,et al.  Similarity search for multidimensional data sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[44]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[45]  Cyrus Shahabi,et al.  Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams , 2003, MMM.