Clustering of large scale QoS time series data in federated clouds using improved variable Chromosome Length Genetic Algorithm (CQGA)

Abstract Service monitoring in federated clouds generates large scale QoS time series data with various unknown, frequent and abnormal patterns. This could be associated with inaccurate resource provisioning and avoid violations through predictive and preventive actions. A sufficient intelligence in the form of expert system for decision support is needed in such situations. Therefore, the main challenge here is to efficiently discover unknown frequent and abnormal patterns from QoS time series data of federated clouds. On the other hand, QoS time series data in federated clouds is unlabeled and consists of frequent and abnormal structures. Studies showed that clustering is the most common and efficient method to discover interesting patterns and structures from unlabeled data. But, clustering is normally associated with time overhead that should be optimized as well as accuracy issues mainly in connection with convergence and finding an optimum number of clusters. This work proposes a new genetic based clustering algorithm that shows better accuracy and speed in comparison to state-of-the-art methods. Furthermore, the proposed algorithm can find the optimum number of clusters concurrently with the clustering itself. Achieved accuracy and convergence of the proposed method in the experimental results assure its use in expert systems, mainly for resource provisioning and further autonomous decision making situations in federated clouds. In addition to the scientific impact of this paper, the proposed method can be used by federated cloud service providers in practice.

[1]  Ira Assent,et al.  Anticipatory DTW for Efficient Similarity Search in Time Series Databases , 2009, Proc. VLDB Endow..

[2]  Maciej Łuczak,et al.  Hierarchical clustering of time series data with parametric derivative dynamic time warping , 2016 .

[3]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[4]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[5]  Lalit M. Patnaik,et al.  Adaptive probabilities of crossover and mutation in genetic algorithms , 1994, IEEE Trans. Syst. Man Cybern..

[6]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[7]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[8]  Lin Yu Tseng,et al.  A genetic clustering algorithm for data with non-spherical-shape clusters , 2000, Pattern Recognit..

[9]  Dimitrios Gunopulos,et al.  A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series , 2003 .

[10]  Jiaqi Liu,et al.  A novel clustering method on time series data , 2011, Expert Syst. Appl..

[11]  L. Hubert,et al.  Comparing partitions , 1985 .

[12]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[13]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[14]  Javier Del Ser,et al.  A new grouping genetic algorithm for clustering problems , 2012, Expert Syst. Appl..

[15]  Shaopeng Wang,et al.  Clustering by differencing potential of data field , 2018, Computing.

[16]  Xianda Zhang,et al.  A genetic algorithm with gene rearrangement for K-means clustering , 2009, Pattern Recognit..

[17]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[18]  Zibin Zheng,et al.  Investigating QoS of Real-World Web Services , 2014, IEEE Transactions on Services Computing.

[19]  Xiaozhe Wang,et al.  Dimension Reduction for Clustering Time Series Using Global Characteristics , 2005, International Conference on Computational Science.

[20]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.

[21]  Shirin Hasanzadeh,et al.  Application of Query Sensitive Similarity Measure in IR systems , 2009, 2009 Third Asia International Conference on Modelling & Simulation.

[22]  Amin Keshavarzi,et al.  Detection of Thin Boundaries between Different Types of Anomalies in Outlier Detection Using Enhanced Neural Networks , 2020, Appl. Artif. Intell..

[23]  T. Liao,et al.  An adaptive genetic clustering method for exploratory mining of feature vector and time series data , 2006 .

[24]  Abolfazl Toroghi Haghighat,et al.  Enhanced time-aware QoS prediction in multi-cloud: a hybrid k-medoids and lazy learning approach (QoPC) , 2019, Computing.

[25]  Lefteris Angelis,et al.  Competence assessment as an expert system for human resource management: A mathematical approach , 2017, Expert Syst. Appl..

[26]  Ashok Samal,et al.  Seed selection algorithm through K-means on optimal number of clusters , 2019, Multimedia Tools and Applications.

[27]  Germain Forestier,et al.  Optimizing dynamic time warping’s window width for time series data mining applications , 2018, Data Mining and Knowledge Discovery.

[28]  Roberto Baragona,et al.  A simulation study on clustering time series with metaheuristic methods , 2001 .

[29]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[30]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[31]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[32]  Witold Pedrycz,et al.  Clustering of interval-valued time series of unequal length based on improved dynamic time warping , 2019, Expert Syst. Appl..

[33]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[34]  Abolfazl Toroghi Haghighat,et al.  Adaptive Resource Management and Provisioning in the Cloud Computing: A Survey of Definitions, Standards and Research Roadmaps , 2017, KSII Trans. Internet Inf. Syst..

[35]  Jarke J. van Wijk,et al.  Cluster and Calendar Based Visualization of Time Series Data , 1999, INFOVIS.

[36]  K. Kosmelj,et al.  Cross-sectional approach for clustering time varying data , 1990 .

[37]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[38]  L. Hubert,et al.  A general statistical framework for assessing categorical clustering in free recall. , 1976 .

[39]  Lefteris Angelis,et al.  Towards an Integrated Platform for Big Data Analysis , 2020, ArXiv.

[40]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[41]  Francesco Giordano,et al.  Clustering complex time-series databases by using periodic components , 2017, Stat. Anal. Data Min..

[42]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[43]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Hossein Hamooni,et al.  Speeding up dynamic time warping distance for sparse time series data , 2017, Knowledge and Information Systems.

[45]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[46]  Mahesh Kumar,et al.  Clustering seasonality patterns in the presence of errors , 2002, KDD.

[47]  Amin Keshavarzi,et al.  Analysis and Prediction of Crimes by Clustering and Classification , 2015 .

[48]  P. Boesiger,et al.  A new correlation‐based fuzzy logic clustering algorithm for FMRI , 1998, Magnetic resonance in medicine.

[49]  Zibin Zheng,et al.  A Spatial-Temporal QoS Prediction Approach for Time-aware Web Service Recommendation , 2016, ACM Trans. Web.

[50]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[51]  L. Hubert,et al.  Measuring the Power of Hierarchical Cluster Analysis , 1975 .

[52]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[53]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[54]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[55]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[56]  Eamonn J. Keogh,et al.  Scaling and time warping in time series querying , 2005, The VLDB Journal.

[57]  Daniel A. Menascé,et al.  QoS Issues in Web Services , 2002, IEEE Internet Comput..