ICLUS: A robust and scalable clustering model for time series via independent component analysis

As a statistical technique, independent component analysis (ICA) is used to separate mixed data sources into statistically independent patterns. ICA is also a useful dimension reduction technique for multivariate data analysis. In this article, we apply ICA to transform multivariate time series data into independent components (ICs), and then develop a clustering algorithm called ICLUS to group time series according to the ICs found. ICLUS is robust to noises, outliers, and different scales in the data. It is also scalable because it can achieve satisfactory performance in clustering large time series data sets based on a modest number of ICs. The clustering model can be used to cluster financial time series with similar structural patterns. The experiments show that this method is effective and efficient, which also significantly outperforms other comparable clustering methods, such as distance-based approaches.

[1]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[2]  Edmond HaoCun Wu,et al.  An Independent Component Ordering and Selection Procedure Based on the MSE Criterion , 2006, ICA.

[3]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[4]  Philip S. Yu,et al.  MALM: a framework for mining sequence database at multiple abstraction levels , 1998, CIKM '98.

[5]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[6]  Kilian Stoffel,et al.  Classification Rules + Time = Temporal Rules , 2002, International Conference on Computational Science.

[7]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[8]  Tak-Chung Fu,et al.  Pattern discovery from stock time series using self-organizing maps , 2016 .

[9]  Robert H. Shumway,et al.  Discrimination and Clustering for Multivariate Time Series , 1998 .

[10]  Carlo Cattani,et al.  Wavelet clustering in time series analysis , 2005 .

[11]  Gareth J. Janacek,et al.  Clustering Time Series with Clipped Data , 2005, Machine Learning.

[12]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[13]  Edmond H. C. Wu,et al.  Volatility Modelling of Multivariate Financial Time Series by Using ICA-GARCH Models , 2005, IDEAL.

[14]  D. Chakrabarti,et al.  A fast fixed - point algorithm for independent component analysis , 1997 .

[15]  Philipos C. Loizou,et al.  An Alternate Partitioning Technique to Quantify the Regularity of Complex Time Series , 2000, Int. J. Bifurc. Chaos.

[16]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[17]  Cheng-Jian Lin,et al.  Time-series prediction using adaptive neuro-fuzzy networks , 2004, Int. J. Syst. Sci..

[18]  Dragomir Anguelov,et al.  Mining The Stock Market : Which Measure Is Best ? , 2000 .

[19]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[20]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[21]  Georg Dorffner,et al.  Temporal pattern recognition in noisy non-stationary time series based on quantization into symbolic streams. Lessons learned from financial volatility trading. , 2000 .

[22]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[23]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[24]  Yoshiyasu Tamura,et al.  Modelling and asset allocation for financial markets based on a stochastic volatility microstructure model , 2005, Int. J. Syst. Sci..

[25]  Georg Dorffner,et al.  The benefit of information reduction for trading strategies , 2002 .

[26]  Jitender S. Deogun,et al.  Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences , 2002, ISMIS.

[27]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[28]  Lai-Wan Chan,et al.  Applying Independent Component Analysis to Factor Model in Finance , 2000, IDEAL.

[29]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[30]  Xiaoming Jin,et al.  Distribution Discovery: Local Analysis of Temporal Rules , 2002, PAKDD.

[31]  Andrew D. Back,et al.  A First Application of Independent Component Analysis to Extracting Structure from Stock Returns , 1997, Int. J. Neural Syst..

[32]  Sergio M. Focardi,et al.  Clustering economic and financial time series : Exploring the existence of stable correlation conditions , 2001 .

[33]  R. Mantegna Hierarchical structure in financial markets , 1998, cond-mat/9802256.

[34]  Aidong Zhang,et al.  WaveCluster: a wavelet-based clustering approach for spatial data in very large databases , 2000, The VLDB Journal.

[35]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[36]  Edmond HaoCun Wu,et al.  Pattern recognition of the term structure using independent component analysis , 2006, Int. J. Pattern Recognit. Artif. Intell..

[37]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[38]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[39]  Xiaoming Jin,et al.  Indexing and Mining of the Local Patterns in Sequence Database , 2002, IDEAL.