Wild adaptive trimming for robust estimation and cluster analysis

Trimming principles play an important role in robust statis tic . However, their use for clustering typically requires some preliminary inf ormation about the contamination rate and the number of groups. We suggest a fresh a ppro ch to trimming that does not rely on this knowledge and that proves to be particularly suited for solving problems in robust cluster analysis. Our approa ch replaces the original K-population (robust) estimation problem with K distinct one-population steps, which take advantage of the good breakdown properties of tri mmed estimators when the trimming level exceeds the usual bound of 0.5. In thi s setting we prove that exact affine equivariance is lost on one hand, but on the o t r hand an arbitrarily high breakdown point can be achieved by “anchoring” the robust estimator. We also support the use of adaptive trimming schemes, in orde r to infer the contamination rate from the data. A further bonus of our methodo l gy is its ability to provide a reliable choice of the usually unknown number of gr ups.

[1]  Alessio Farcomeni,et al.  Strong consistency and robustness of the Forward Search estimator of multivariate location and scatter , 2014, J. Multivar. Anal..

[2]  Bent Nielsen,et al.  Corrigendum: Analysis of the forward search using some new results for martingales and empirical processes , 2016, Bernoulli.

[3]  G. Ritter Robust Cluster Analysis and Variable Selection , 2014 .

[4]  Anthony C. Atkinson,et al.  Monitoring robust regression , 2014 .

[5]  A. Atkinson,et al.  Finding an unknown number of multivariate outliers , 2009 .

[6]  Peter Rousseeuw,et al.  Detecting Deviating Data Cells , 2016, Technometrics.

[7]  Anthony C. Atkinson,et al.  The power of monitoring: how to make the most of a contaminated multivariate sample , 2018, Stat. Methods Appl..

[8]  Anthony C. Atkinson,et al.  Finding the Number of Disparate Clusters with Background Contamination , 2013, ECDA.

[9]  Alfonso Gordaliza Ramos,et al.  A general trimming approach to robust cluster analysis , 2007 .

[10]  M. Jhun,et al.  Asymptotics for the minimum covariance determinant estimator , 1993 .

[11]  Brenton R. Clarke,et al.  An adaptive trimmed likelihood algorithm for identification of multivariate outliers , 2006 .

[12]  Alessio Farcomeni,et al.  Robust Constrained Clustering in Presence of Entry-Wise Outliers , 2014, Technometrics.

[13]  Ruben H. Zamar,et al.  Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination , 2014 .

[14]  Eric A. Cator,et al.  Central limit theorem and influence function for the MCD estimators at general multivariate distributions , 2009, 0907.0079.

[15]  Alessio Farcomeni,et al.  Robust distances for outlier-free goodness-of-fit testing , 2013, Comput. Stat. Data Anal..

[16]  P. Rousseeuw,et al.  Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices , 1991 .

[17]  Anthony C. Atkinson,et al.  Cluster detection and clustering with random start forward searches , 2018 .

[18]  C. Hennig Breakdown points for maximum likelihood estimators of location–scale mixtures , 2004, math/0410073.

[19]  P. L. Davies,et al.  Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices , 1987 .

[20]  Bent Nielsen,et al.  Asymptotic Theory of Outlier Detection Algorithms for Linear Time Series Regression Models , 2016 .

[21]  M. Hubert,et al.  High-Breakdown Robust Multivariate Methods , 2008, 0808.0657.

[22]  Luis Angel García-Escudero,et al.  A reweighting approach to robust clustering , 2017, Statistics and Computing.

[23]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[24]  C. Hennig,et al.  How to find an appropriate clustering for mixed‐type variables with application to socio‐economic stratification , 2013 .

[25]  Marco Riani,et al.  Random Start Forward Searches with Envelopes for Detecting Clusters in Multivariate Data , 2006 .

[26]  David M. Rocke,et al.  The Distribution of Robust Distances , 2005 .

[27]  J. A. Cuesta-Albertos,et al.  Trimming and likelihood: Robust location and dispersion estimation in the elliptical model , 2008, 0811.0503.

[28]  Christian Hennig,et al.  Robust Improper Maximum Likelihood: Tuning, Computation, and a Comparison With Other Methods for Robust Gaussian Clustering , 2014, 1406.0808.

[29]  Christian Hennig Breakdown points for maximum likelihood-estimators of location-scale mixtures , 2002 .

[30]  Alessio Farcomeni,et al.  Snipping for robust k-means clustering under component-wise contamination , 2014, Stat. Comput..

[31]  Andrea Cerioli,et al.  Multivariate Outlier Detection With High-Breakdown Estimators , 2010 .