Agreement-based fuzzy C-means for clustering data with blocks of features

In real-world problems we encounter situations where patterns are described by blocks (families) of features where each of these groups comes with a well-expressed semantics. For instance, in spatiotemporal data we are dealing with spatial coordinates of the objects (say, x-y coordinates) while the temporal part of the objects forms another collection of features. It is apparent that when clustering objects being described by families of features, it becomes intuitively justifiable to anticipate their different role and contribution to the clustering process of the data whereas the clustering is sought to be reflective of an overall structure in the data set. To address this issue, we introduce an agreement based fuzzy clustering-a fuzzy clustering with blocks of features. The detailed investigations are carried out for the well-known algorithm of fuzzy clustering that is fuzzy C-means (FCM). We propose an extended version of the FCM where a composite distance function is endowed with adjustable weights (parameters) quantifying an impact coming from the blocks of features. A global evaluation criterion is used to assess the quality of the obtained results. It is treated as a fitness function in the optimization of the weights through the use of particle swarm optimization (PSO). The behavior of the proposed method is investigated in application to synthetic and real-world data as well as a certain case study.

[1]  Daniel B. Neill,et al.  Expectation-based scan statistics for monitoring spatial time series data , 2009 .

[2]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Elizabeth Ann Maharaj,et al.  Fuzzy clustering of time series in the frequency domain , 2011, Inf. Sci..

[4]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[5]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[6]  Duc Truong Pham,et al.  Control chart pattern recognition using a new type of self-organizing neural network , 1998 .

[7]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Witold Pedrycz,et al.  Semantic Web Content Analysis: A Study in Proximity-Based Collaborative Clustering , 2007, IEEE Transactions on Fuzzy Systems.

[9]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[10]  Witold Pedrycz,et al.  A consensus-driven fuzzy clustering , 2008, Pattern Recognit. Lett..

[11]  Miguel A. Sanz-Bobi,et al.  Auto-Regressive Processes Explained by Self-Organized Maps. Application to the Detection of Abnormal Behavior in Industrial Processes , 2011, IEEE Transactions on Neural Networks.

[12]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[13]  Thomas G. Dietterich,et al.  Spatiotemporal Models for Data-Anomaly Detection in Dynamic Environmental Monitoring Campaigns , 2011, TOSN.

[14]  Slava Kisilevich,et al.  Spatio-temporal clustering , 2010, Data Mining and Knowledge Discovery Handbook.

[15]  Salvatore Sessa,et al.  The extended fuzzy C-means algorithm for hotspots in spatio-temporal GIS , 2011, Expert Syst. Appl..

[16]  Eyal Amir,et al.  Real-time Bayesian Anomaly Detection for Environmental Sensor Data , 2007 .

[17]  Marjorie Skubic,et al.  Modeling Fuzziness Measures for Best Wavelet Selection , 2008, IEEE Transactions on Fuzzy Systems.

[18]  A. Agogino,et al.  Entropy based anomaly detection applied to space shuttle main engines , 2006, 2006 IEEE Aerospace Conference.

[19]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[20]  Clement T. Yu,et al.  Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping , 2003, IEEE Trans. Knowl. Data Eng..

[21]  A. Hill,et al.  The North American Animal Disease Spread Model: a simulation model to assist decision making in evaluating animal disease incursions. , 2007, Preventive veterinary medicine.

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  Madasu Hanmandlu,et al.  Structure identification of generalized adaptive neuro-fuzzy inference systems , 2003, IEEE Trans. Fuzzy Syst..

[24]  Dipankar Dasgupta,et al.  Novelty detection in time series data using ideas from immunology , 1996 .

[25]  Paul R. Cohen,et al.  Bayesian Clustering by Dynamics Contents 1 Introduction 1 2 Clustering Markov Chains 2 , 2022 .

[26]  A. Lawson,et al.  Review of methods for space–time disease surveillance , 2010, Spatial and Spatio-temporal Epidemiology.

[27]  M. Kulldor,et al.  Prospective time-periodic geographical disease surveillance using a scan statistic , 2001 .

[28]  Andrzej Bargiela,et al.  Fuzzy clustering with semantically distinct families of variables: Descriptive and predictive aspects , 2010, Pattern Recognit. Lett..

[29]  Dimitrios Gunopulos,et al.  A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series , 2003 .

[30]  Weina Wang,et al.  On fuzzy cluster validity indices , 2007, Fuzzy Sets Syst..

[31]  Witold Pedrycz,et al.  Collaborative clustering with the use of Fuzzy C-Means and its quantification , 2008, Fuzzy Sets Syst..

[32]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[33]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[34]  Andrej Dobnikar,et al.  Generation of a clustering ensemble based on a gravitational self-organising map , 2012, Neurocomputing.

[35]  Witold Pedrycz Proximity-Based Clustering: A Search for Structural Consistency in Data With Semantic Blocks of Features , 2013, IEEE Transactions on Fuzzy Systems.

[36]  Jared Aldstadt,et al.  An incremental Knox test for the determination of the serial interval between successive cases of an infectious disease , 2007 .

[37]  Witold Pedrycz,et al.  P-FCM: a proximity -- based fuzzy clustering , 2004, Fuzzy Sets Syst..

[38]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from time series , 2006, IEEE Transactions on Knowledge and Data Engineering.

[39]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[40]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[42]  Dino Pedreschi,et al.  Time-focused clustering of trajectories of moving objects , 2006, Journal of Intelligent Information Systems.

[43]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[44]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[45]  A. Khatkhate,et al.  Symbolic time-series analysis for anomaly detection in mechanical systems , 2006, IEEE/ASME Transactions on Mechatronics.

[46]  Pierpaolo D'Urso,et al.  A Fuzzy Clustering Model for Multivariate Spatial Time Series , 2010, J. Classif..

[47]  Dit-Yan Yeung,et al.  Time series clustering with ARMA mixtures , 2004, Pattern Recognit..

[48]  Dzung L. Pham,et al.  Spatial Models for Fuzzy Clustering , 2001, Comput. Vis. Image Underst..

[49]  J. Ma,et al.  Time-series novelty detection using one-class support vector machines , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[50]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[51]  Sandro Vega-Pons,et al.  Weighted partition consensus via kernels , 2010, Pattern Recognit..

[52]  André L. V. Coelho,et al.  Inducing multi-objective clustering ensembles with genetic programming , 2010, Neurocomputing.

[53]  Eamonn J. Keogh,et al.  Finding Unusual Medical Time-Series Subsequences: Algorithms and Applications , 2006, IEEE Transactions on Information Technology in Biomedicine.

[54]  Glen D. Johnson Prospective spatial prediction of infectious disease: experience of New York State (USA) with West Nile Virus and proposed directions for improved surveillance , 2008, Environmental and Ecological Statistics.

[55]  Vipin Kumar,et al.  Comparative Evaluation of Anomaly Detection Techniques for Sequence Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[56]  Joydeep Ghosh,et al.  CONSENSUS-BASED ENSEMBLES OF SOFT CLUSTERINGS , 2008, MLMTA.

[57]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[58]  I. Burhan Türksen,et al.  Enhanced Fuzzy System Models With Improved Fuzzy Clustering Algorithm , 2008, IEEE Transactions on Fuzzy Systems.

[59]  Scott Dick,et al.  ANCFIS: A Neurofuzzy Architecture Employing Complex Fuzzy Sets , 2011, IEEE Transactions on Fuzzy Systems.

[60]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[61]  Cheng Yang,et al.  Hybrid sampling on mutual information entropy-based clustering ensembles for optimizations , 2010, Neurocomputing.

[62]  Witold Pedrycz,et al.  A new PSO-optimized geometry of spatial and spatio-temporal scan statistics for disease outbreak detection , 2012, Swarm and Evolutionary Computation.

[63]  Witold Pedrycz,et al.  Forming consensus in the networks of knowledge , 2007, Eng. Appl. Artif. Intell..

[64]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[65]  Pang-Ning Tan,et al.  A Robust Graph-Based Algorithm for Detection and Characterization of Anomalies in Noisy Multivariate Time Series , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[66]  Pierpaolo D'Urso,et al.  Fuzzy Clustering for Data Time Arrays With Inlier and Outlier Time Trajectories , 2005, IEEE Transactions on Fuzzy Systems.

[67]  Lawrence O. Hall,et al.  A scalable framework for cluster ensembles , 2009, Pattern Recognit..

[68]  Amit Konar,et al.  Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm , 2008, Pattern Recognit. Lett..

[69]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[70]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[71]  Panos Kalnis,et al.  On Discovering Moving Clusters in Spatio-temporal Data , 2005, SSTD.

[72]  Eyke Hüllermeier,et al.  Comparing Fuzzy Partitions: A Generalization of the Rand Index and Related Measures , 2012, IEEE Transactions on Fuzzy Systems.

[73]  Maoguo Gong,et al.  Image change detection based on an improved rough fuzzy c-means clustering algorithm , 2013, International Journal of Machine Learning and Cybernetics.

[74]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[75]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[76]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[77]  W. Peizhuang Pattern Recognition with Fuzzy Objective Function Algorithms (James C. Bezdek) , 1983 .

[78]  Witold Pedrycz,et al.  A Development of Fuzzy Encoding and Decoding Through Fuzzy Clustering , 2008, IEEE Transactions on Instrumentation and Measurement.

[79]  Igor Skrjanc,et al.  Supervised Hierarchical Clustering in Fuzzy Model Identification , 2011, IEEE Transactions on Fuzzy Systems.

[80]  Frank Klawonn,et al.  Fuzzy clustering with weighting of data variables , 2000, EUSFLAT-ESTYLF Joint Conf..

[81]  Witold Pedrycz,et al.  Collaborative fuzzy clustering , 2002, Pattern Recognit. Lett..

[82]  P. Protopapas,et al.  Finding outlier light curves in catalogues of periodic variable stars , 2005, astro-ph/0505495.

[83]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[84]  M. Kulldorff,et al.  A Space–Time Permutation Scan Statistic for Disease Outbreak Detection , 2005, PLoS medicine.

[85]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[86]  Pierpaolo D’Urso,et al.  Autocorrelation-based fuzzy clustering of time series , 2009, Fuzzy Sets Syst..

[87]  Yun Yang,et al.  Time Series Clustering Via RPCL Network Ensemble With Different Representations , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[88]  Derek Anderson,et al.  Comparing Fuzzy, Probabilistic, and Possibilistic Partitions Using the Earth Mover’s Distance , 2013, IEEE Transactions on Fuzzy Systems.

[89]  Han-Xiong Li,et al.  Spatially Constrained Fuzzy-Clustering-Based Sensor Placement for Spatiotemporal Fuzzy-Control System , 2010, IEEE Transactions on Fuzzy Systems.

[90]  G Gettinby,et al.  A stastistical system for detecting Salmonella outbreaks in British livestock , 2006, Epidemiology and Infection.

[91]  Raymond T. Ng,et al.  Indexing spatio-temporal trajectories with Chebyshev polynomials , 2004, SIGMOD '04.

[92]  Srinivasan Parthasarathy,et al.  Anomaly detection and spatio-temporal analysis of global climate system , 2009, SensorKDD '09.

[93]  Yohsuke Kinouchi,et al.  Neural networks for event extraction from time series: a back propagation algorithm approach , 2005, Future Gener. Comput. Syst..

[94]  Witold Pedrycz,et al.  Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means , 2013, IEEE Transactions on Fuzzy Systems.

[95]  Christian Sonesson,et al.  A CUSUM framework for detection of space–time disease clusters using scan statistics , 2007, Statistics in medicine.

[96]  ZhangXiaohang,et al.  A novel clustering method on time series data , 2011 .

[97]  R. Platt,et al.  A generalized linear mixed models approach for detecting incident clusters of disease in small areas, with an application to biological terrorism. , 2004, American journal of epidemiology.

[98]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[99]  Gregory F. Cooper,et al.  Bayesian Network Scan Statistics for Multivariate Pattern Detection , 2009 .

[100]  Fernando Gomide,et al.  Granular Models for Time‐Series Forecasting , 2008 .

[101]  Sylvia Richardson,et al.  A comparison of Bayesian spatial models for disease mapping , 2005, Statistical methods in medical research.

[102]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[103]  E G Knox,et al.  The Detection of Space‐Time Interactions , 1964 .

[104]  Eamonn J. Keogh,et al.  A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering , 2005, PAKDD.

[105]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[106]  Witold Pedrycz,et al.  Cluster-Centric Fuzzy Modeling , 2014, IEEE Transactions on Fuzzy Systems.

[107]  Witold Pedrycz,et al.  Collaborative Fuzzy Clustering Algorithms: Some Refinements and Design Guidelines , 2012, IEEE Transactions on Fuzzy Systems.

[108]  Frank Klawonn,et al.  Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points , 2003, IDA.

[109]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[110]  Min Wang,et al.  Mining Spatial-temporal Clusters from Geo-databases , 2006, ADMA.

[111]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[112]  Witold Pedrycz,et al.  Anomaly detection in time series data using a fuzzy c-means clustering , 2013, 2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS).

[113]  Elizabeth Ann Maharaj,et al.  Wavelet-based Fuzzy Clustering of Time Series , 2010, J. Classif..

[114]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[115]  Vladimir Pozdnyakov,et al.  Scan Statistics: Methods and Applications , 2009 .

[116]  Athanasios Kehagias,et al.  Predictive modular fuzzy systems for time-series classification , 1997, IEEE Trans. Fuzzy Syst..

[117]  Roy George,et al.  Fuzzy Cluster Analysis of Spatio-Temporal Data , 2003, ISCIS.

[118]  Thomas A. Runkler,et al.  Forecasting of clustered time series with recurrent neural networks and a fuzzy clustering scheme , 2009, 2009 International Joint Conference on Neural Networks.

[119]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[120]  P. Boesiger,et al.  A new correlation‐based fuzzy logic clustering algorithm for FMRI , 1998, Magnetic resonance in medicine.

[121]  Yan Shi,et al.  A general method of spatio-temporal clustering analysis , 2011, Science China Information Sciences.

[122]  Sheng-Tun Li,et al.  Fuzzy Time Series Forecasting With a Probabilistic Smoothing Hidden Markov Model , 2012, IEEE Transactions on Fuzzy Systems.

[123]  P A Rogerson,et al.  Surveillance systems for monitoring the development of spatial patterns. , 1997, Statistics in medicine.

[124]  Yadong Wang,et al.  Improving fuzzy c-means clustering based on feature-weight learning , 2004, Pattern Recognit. Lett..

[125]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[126]  James M. Keller,et al.  Comparing Fuzzy, Probabilistic, and Possibilistic Partitions , 2010, IEEE Transactions on Fuzzy Systems.

[127]  Shyi-Ming Chen,et al.  TAIEX Forecasting Based on Fuzzy Time Series and Fuzzy Variation Groups , 2011, IEEE Transactions on Fuzzy Systems.

[128]  Yoshiharu Sato,et al.  On a multicriteria fuzzy Clustering Method for 3-Way Data , 1994, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[129]  Walmir M. Caminhas,et al.  Multivariable Gaussian Evolving Fuzzy Modeling System , 2011, IEEE Transactions on Fuzzy Systems.

[130]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[131]  W. Pedrycz,et al.  Construction of fuzzy models through clustering techniques , 1993 .

[132]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[133]  ShimKyuseok,et al.  Efficient algorithms for mining outliers from large data sets , 2000 .

[134]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[135]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .