Stock market co-movement assessment using a three-phase clustering method

An automatic stock market categorization system would be invaluable to individual investors and financial experts, providing them with the opportunity to predict the stock price changes of a company with respect to other companies. In recent years, clustering all companies in the stock markets based on their similarities in the shape of the stock market has increasingly become a common scheme. However, existing approaches are impractical because the stock price data are high-dimensional data and the changes in the stock price usually occur with shift, which makes the categorization more complex. Moreover, no stock market categorization method that can cluster companies down to the sub-cluster level, which are very meaningful to end users, has been developed. Therefore, in this paper, a novel three-phase clustering model is proposed to categorize companies based on the similarity in the shape of their stock markets. First, low-resolution time series data are used to approximately categorize companies. Then, in the second phase, pre-clustered companies are split into some pure sub-clusters. Finally, sub-clusters are merged in the third phase. The accuracy of the proposed method is evaluated using various published data sets in different domains. We show that this approach has good performance in efficiency and effectiveness compared to existing conventional clustering algorithms.

[1]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[2]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[3]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[4]  Shu-hsien Liao,et al.  Mining the co-movement between foreign exchange rates and category stock indexes in the Taiwan financial capital market , 2011, Expert Syst. Appl..

[5]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[6]  Hiroaki Sakoe,et al.  A Dynamic Programming Approach to Continuous Speech Recognition , 1971 .

[7]  Thomas Villmann,et al.  Similarity-Based Clustering, Recent Developments and Biomedical Applications [outcome of a Dagstuhl Seminar] , 2009, Similarity-Based Clustering.

[8]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[9]  Linlin Niu,et al.  Co-Movements of Shanghai and New York Stock Prices by Time-Varying Regressions , 2011 .

[10]  Qiang Wang,et al.  A symbolic representation of time series , 2005, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005..

[11]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[12]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[13]  Chonghui Guo,et al.  Time Series Clustering Based on ICA for Stock Data Analysis , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[14]  Wesley W. Chu,et al.  An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.

[15]  Walid G. Aref,et al.  Incremental, online, and merge mining of partial periodic patterns in time-series databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  Michael Graham,et al.  Co-movement of the Finnish and international stock markets: a wavelet analysis , 2011 .

[17]  Chris H. Q. Ding,et al.  Adaptive dimension reduction for clustering high dimensional data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Peng Chen,et al.  On international stock market co-movements and macroeconomic risks , 2013 .

[19]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[20]  Dat Tran,et al.  Fuzzy C-Means Clustering-Based Speaker Verification , 2002, AFSS.

[21]  R. J. Alcock,et al.  Time-Series Similarity Queries Employing a Feature-Based Approach , 1999 .

[22]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[23]  Eamonn J. Keogh,et al.  A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering , 2005, PAKDD.

[24]  Eyke Hüllermeier,et al.  Online clustering of parallel data streams , 2006, Data Knowl. Eng..

[25]  Michael Graham,et al.  Global and regional co-movement of the MENA stock markets , 2013 .

[26]  Mohammad Hossein Fazel Zarandi,et al.  A type-2 fuzzy rule-based expert system model for stock price analysis , 2009, Expert Syst. Appl..

[27]  Paul R. Cohen,et al.  Multivariate Clustering by Dynamics , 2000, AAAI/IAAI.

[28]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[29]  Gareth J. Janacek,et al.  Clustering Time Series with Clipped Data , 2005, Machine Learning.

[30]  Saeed Aghabozorgi,et al.  A New Approach to Present Prototypes in Clustering of Time Series , 2011 .

[31]  Eamonn J. Keogh,et al.  Multimedia Retrieval Using Time Series Representation and Relevance Feedback , 2005, ICADL.

[32]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[33]  Xiaodong Liu,et al.  Forecasting shanghai composite index based on fuzzy time series and improved C-fuzzy decision trees , 2012, Expert Syst. Appl..

[34]  George M. Church,et al.  Aligning gene expression time series with time warping algorithms , 2001, Bioinform..

[35]  Pushan Dutt,et al.  Stock Market Comovements and Industrial Structure: MONEY, CREDIT AND BANKING , 2013 .

[36]  Shu-Hsien Liao,et al.  Data mining investigation of co-movements on the Taiwan and China stock markets for future investment portfolio , 2013, Expert Syst. Appl..

[37]  Tak-Chung Fu,et al.  Representing financial time series based on data point importance , 2008, Eng. Appl. Artif. Intell..

[38]  C.A. Ratanamahatana,et al.  Clustering Multimedia Data Using Time Series , 2006, 2006 International Conference on Hybrid Information Technology.

[39]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[40]  M. K. Tiwari,et al.  Clustering Indian stock market data for portfolio management , 2010, Expert Syst. Appl..

[41]  B. Ray,et al.  An Interweaved HMM/DTW Approach to Robust Time Series Clustering , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[42]  Li Wei,et al.  Compression-based data mining of sequential data , 2007, Data Mining and Knowledge Discovery.

[43]  Depei Bao,et al.  A generalized model for financial time series representation and prediction , 2007, Applied Intelligence.

[44]  Vit Niennattrakul,et al.  Inaccuracies of Shape Averaging Method Using Dynamic Time Warping for Time Series Data , 2007, International Conference on Computational Science.

[45]  António Rua,et al.  International comovement of stock market returns: a wavelet analysis , 2009 .

[46]  Lars Norden,et al.  The Comovement of Credit Default Swap, Bond and Stock Markets: An Empirical Analysis , 2004 .

[47]  A. M. Masih,et al.  Dynamic Modeling of Stock Market Interdependencies: An Empirical Investigation of Australia and the Asian NICs , 2001 .

[48]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[49]  Nicholas Biekpe,et al.  CONTAGION AND INTERDEPENDENCE IN AFRICAN STOCK MARKETS , 2005 .

[50]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[51]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[52]  David B. Dunson,et al.  Adaptive dimension reduction with a Gaussian process prior , 2011 .

[53]  Marcella Corduas,et al.  Time series clustering and classification by the autoregressive metric , 2008, Comput. Stat. Data Anal..

[54]  Nasser Yazdani,et al.  Matching and indexing sequences of different lengths , 1997, CIKM '97.

[55]  Eamonn J. Keogh,et al.  Iterative Deepening Dynamic Time Warping for Time Series , 2002, SDM.

[56]  Pushan Dutt,et al.  Stock Market Comovements and Industrial Structure , 2008 .

[57]  Elizabeth Ann Maharaj,et al.  Time-Series Clustering , 2015 .

[58]  Antonios Antoniou,et al.  Modelling International Price Relationships and Interdependencies between the Stock Index and Stock Index Futures Markets of Three EU Countries: A Multivariate Analysis , 2003 .

[59]  Sunil Wahal,et al.  Style investing, comovement and return predictability. , 2013 .

[60]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[61]  Mara Madaleno,et al.  International stock market indices comovements: a new look , 2012 .

[62]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[63]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[64]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[65]  Hui Xiong,et al.  Adapting the right measures for K-means clustering , 2009, KDD.

[66]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  Yang Zhang,et al.  Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform , 2006, Informatica.

[68]  Shlomo Moran,et al.  Optimal implementations of UPGMA and other common clustering algorithms , 2007, Inf. Process. Lett..

[69]  KeoghEamonn,et al.  On the Need for Time Series Data Mining Benchmarks , 2003 .

[70]  Hans-Peter Kriegel,et al.  Similarity Search on Time Series Based on Threshold Queries , 2006, EDBT.

[71]  Stefano Maria Iacus,et al.  Clustering of discretely observed diffusion processes , 2010, Comput. Stat. Data Anal..

[72]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[73]  Sunil Wahal,et al.  Comovement and Return Predictability , 2007 .

[74]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[75]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[76]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[77]  Efendi N. Nasibov,et al.  Time series labeling algorithms based on the K-nearest neighbors' frequencies , 2011, Expert Syst. Appl..

[78]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[79]  Dimitrios Gunopulos,et al.  Iterative Incremental Clustering of Time Series , 2004, EDBT.

[80]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[81]  Eamonn J. Keogh,et al.  A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases , 2000, PAKDD.

[82]  Xiaogang Wang,et al.  A roadmap of clustering algorithms: finding a match for a biomedical application , 2008, Briefings Bioinform..

[83]  Kyoji Kawagoe,et al.  New Time Series Data Representation ESAX for Financial Applications , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[84]  Vincent S. Tseng,et al.  A novel two-level clustering method for time series data analysis , 2010, Expert Syst. Appl..

[85]  David Thesmar,et al.  Categorization Bias in the Stock Market , 2012 .

[86]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[87]  Chellu Chandra Sekhar,et al.  A density based method for multivariate time series clustering in kernel feature space , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[88]  Michael Graham,et al.  Integration of 22 emerging stock markets: A three-dimensional analysis , 2012 .

[89]  Lixia Loh,et al.  Co-movement of Asia-Pacific with European and US stock market returns: A cross-time-frequency analysis , 2013 .

[90]  George Filis,et al.  Dynamic Co-Movements of Stock Market Returns, Implied Volatility and Policy Uncertainty , 2013 .

[91]  Wei Liu,et al.  Research of SAX in Distance Measuring for Financial Time Series Data , 2009, 2009 First International Conference on Information Science and Engineering.

[92]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[93]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[94]  Cláudia Antunes,et al.  Temporal Data Mining: an overview , 2001 .

[95]  Christos Faloutsos,et al.  FTW: fast similarity search under the time warping distance , 2005, PODS.

[96]  Paul R. Cohen,et al.  A Method for Clustering the Experiences of a Mobile Robot that Accords with Human Judgments , 2000, AAAI/IAAI.

[97]  津本 周作,et al.  Empirical Comparison of Clustering Methods for Long Time-Series Databases (小特集 「アクティブマイニング」および一般) , 2003 .

[98]  Dimitrios Gunopulos,et al.  A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series , 2003 .

[99]  Tak-chung Fu,et al.  Flexible time series pattern matching based on perceptually important points , 2001 .

[100]  Ying Wah Teh,et al.  Incremental Clustering of Time-Series by Fuzzy Clustering , 2012, J. Inf. Sci. Eng..

[101]  Mário A. T. Figueiredo,et al.  Similarity-Based Clustering of Sequences Using Hidden Markov Models , 2003, MLDM.

[102]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[103]  Fabrizio Durante,et al.  An Analysis of the Dependence Among Financial Markets by Spatial Contagion , 2013, Int. J. Intell. Syst..

[104]  Vjekoslav Galzina,et al.  An adaptive network-based fuzzy inference system (ANFIS) for the forecasting: The case of close price indices , 2013, Expert Syst. Appl..

[105]  Clement T. Yu,et al.  Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping , 2003, IEEE Trans. Knowl. Data Eng..

[106]  Pasi Fränti,et al.  Time-series clustering by approximate prototypes , 2008, ICPR.