A review of robust clustering methods

Deviations from theoretical assumptions together with the presence of certain amount of outlying observations are common in many practical statistical applications. This is also the case when applying Cluster Analysis methods, where those troubles could lead to unsatisfactory clustering results. Robust Clustering methods are aimed at avoiding these unsatisfactory results. Moreover, there exist certain connections between robust procedures and Cluster Analysis that make Robust Clustering an appealing unifying framework. A review of different robust clustering approaches in the literature is presented. Special attention is paid to methods based on trimming which try to discard most outlying data when carrying out the clustering process.

[1]  Frank Critchley,et al.  RelaxMCD: Smooth optimisation for the Minimum Covariance Determinant estimator , 2010, Comput. Stat. Data Anal..

[2]  J. A. Cuesta-Albertos,et al.  On the Asymptotics of Trimmed Best k-Nets☆ , 2002 .

[3]  David M. Rocke,et al.  Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator , 2004, Comput. Stat. Data Anal..

[4]  Christian Hennig,et al.  Clusters, outliers, and regression: fixed point clusters , 2003 .

[5]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[6]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[7]  M. Markatou Mixture Models, Robustness, and the Weighted Likelihood Methodology , 2000, Biometrics.

[8]  Carla M. Santos-Pereira,et al.  Detection of Outliers in Multivariate Data: A Method Based on Clustering and Robust Estimators , 2002, COMPSTAT.

[9]  M. Gallegos,et al.  Trimming algorithms for clustering contaminated grouped data and their robustness , 2009, Adv. Data Anal. Classif..

[10]  Yuanyuan Ding,et al.  Robust clustering in high dimensional data using statistical depths , 2007, BMC Bioinformatics.

[11]  Alfonso Gordaliza Ramos,et al.  A general trimming approach to robust cluster analysis , 2007 .

[12]  Luis Angel García-Escudero,et al.  A Proposal for Robust Curve Clustering , 2005, J. Classif..

[13]  Luis Angel García-Escudero,et al.  The importance of the scales in heterogeneous robust clustering , 2007, Comput. Stat. Data Anal..

[14]  A. Cuevas,et al.  Cluster analysis: a further approach based on density estimation , 2001 .

[15]  Brian Everitt,et al.  Cluster analysis , 1974 .

[16]  Martin Schader,et al.  Data Analysis: Scientific Modeling And Practical Application , 2000 .

[17]  Frank Plastria,et al.  Non-hierarchical clustering with masloc , 1983, Pattern Recognit..

[18]  R. A. Maller A central limit theorem for multivariate generalized trimmed k-means. , 2001 .

[19]  Juan Antonio Cuesta-Albertos,et al.  Impartial trimmed k-means for functional data , 2007, Comput. Stat. Data Anal..

[20]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[21]  C. Hennig,et al.  Dissolution point and isolation robustness: Robustness criteria for general cluster analysis methods , 2008 .

[22]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[23]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[24]  Marco Riani,et al.  Random Start Forward Searches with Envelopes for Detecting Clusters in Multivariate Data , 2006 .

[25]  James B. Orlin,et al.  Scale-invariant clustering with minimum volume ellipsoids , 2008, Comput. Oper. Res..

[26]  Francesca Torti,et al.  New robust dynamic plots for regression mixture detection , 2009, Adv. Data Anal. Classif..

[27]  G. Sawitzki,et al.  Excess Mass Estimates and Tests for Multimodality , 1991 .

[28]  Vladimir Estivill-Castro,et al.  Fast and Robust General Purpose Clustering Algorithms , 2000, PRICAI.

[29]  C. Hennig Breakdown points for maximum likelihood estimators of location–scale mixtures , 2004, math/0410073.

[30]  David L. Woodruff,et al.  Identification of Outliers in Multivariate Data , 1996 .

[31]  R. Maronna,et al.  Multivariate Clustering Procedures with Variable Metrics , 1974 .

[32]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[33]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[34]  Ricardo A. Maronna,et al.  Principal Components and Orthogonal Regression Based on Robust Scales , 2005, Technometrics.

[35]  Georg Ch. Pflug,et al.  Mathematical statistics and applications , 1985 .

[36]  Anthony C. Atkinson,et al.  Exploring Multivariate Data with the Forward Search , 2004 .

[37]  Laurie Davies,et al.  The identification of multiple outliers , 1993 .

[38]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[39]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[40]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[41]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[42]  Stefan Van Aelst,et al.  Machine Learning and Robust Data Mining , 2007, Comput. Stat. Data Anal..

[43]  Helmuth Spaeth,et al.  Cluster-Analyse-Algorithmen zur Objektklassifizierung und Datenreduktion , 1975 .

[44]  Hrishikesh D. Vinod Mathematica Integer Programming and the Theory of Grouping , 1969 .

[45]  David L. Woodruff,et al.  Experiments with, and on, algorithms for maximum likelihood clustering , 2004, Comput. Stat. Data Anal..

[46]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[47]  Hans-Hermann Bock,et al.  PROBABILITY MODELS AND HYPOTHESES TESTING IN PARTITIONING CLUSTER ANALYSIS , 1996 .

[48]  S. Ng,et al.  Robust Cluster Analysis via Mixture Models , 2006 .

[49]  Ruben H. Zamar,et al.  Diagnosing Multivariate Outliers Detected by Robust Estimators , 2009 .

[50]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[51]  A. Gordaliza Best approximations to random variables based on trimming procedures , 1991 .

[52]  María Teresa Gallegos,et al.  Maximum Likelihood Clustering with Outliers , 2002 .

[53]  T. Banerjee Exploring Multivariate Data With the Forward Search , 2006 .

[54]  W. Polonik Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach , 1995 .

[55]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[56]  A. Gordaliza,et al.  Trimmed best k-nets: A robustified version of an L∞-based clustering method , 1998 .

[57]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[58]  Luis Angel García-Escudero,et al.  Generalized Radius Processes for Elliptically Contoured Distributions , 2005 .

[59]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[60]  Peter J. Rousseeuw,et al.  An Algorithm for Positive-Breakdown Regression Based on Concentration Steps , 2000 .

[61]  Anthony C. Atkinson,et al.  Exploratory tools for clustering multivariate data , 2007, Comput. Stat. Data Anal..

[62]  H. Bock Probabilistic models in cluster analysis , 1996 .

[63]  Luis Angel García-Escudero,et al.  Trimming Tools in Exploratory Data Analysis , 2003 .

[64]  Steven M. Lalonde,et al.  A First Course in Multivariate Statistics , 1997, Technometrics.

[65]  Luis Angel García-Escudero,et al.  Exploring the number of groups in robust model-based clustering , 2011, Stat. Comput..

[66]  Xiaogang Wang,et al.  Linear grouping using orthogonal regression , 2006, Comput. Stat. Data Anal..

[67]  Hans-Hermann Bock,et al.  Classification, Clustering, and Data Analysis: Recent Advances and Applications , 2002 .

[68]  Anthony C. Atkinson,et al.  Robust classification with categorical variables , 2006 .

[69]  A. Raftery,et al.  Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes , 1998 .

[70]  M. Gallegos,et al.  A robust method for cluster analysis , 2005, math/0504513.

[71]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[72]  Carlos Matrán,et al.  Robust estimation in the normal mixture model based on robust clustering , 2008 .

[73]  J. A. Cuesta-Albertos,et al.  Trimmed $k$-means: an attempt to robustify quantizers , 1997 .

[74]  R. Hathaway A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions , 1985 .

[75]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[76]  S. Van Aelst,et al.  Robust linear clustering , 2009 .

[77]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[78]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[79]  Peter Filzmoser,et al.  Robust fitting of mixtures using the trimmed likelihood estimator , 2007, Comput. Stat. Data Anal..

[80]  Peter G. Bryant,et al.  Large-sample results for optimization-based clustering methods , 1991 .

[81]  David L. Woodruff,et al.  Computational Connections between Robust Multivariate Analysis and Clustering , 2002, COMPSTAT.

[82]  A. Raftery,et al.  Detecting features in spatial point processes with clutter via model-based clustering , 1998 .

[83]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.