The power of monitoring: how to make the most of a contaminated multivariate sample

Diagnostic tools must rely on robust high-breakdown methodologies to avoid distortion in the presence of contamination by outliers. However, a disadvantage of having a single, even if robust, summary of the data is that important choices concerning parameters of the robust method, such as breakdown point, have to be made prior to the analysis. The effect of such choices may be difficult to evaluate. We argue that an effective solution is to look at several pictures, and possibly to a whole movie, of the available data. This can be achieved by monitoring, over a range of parameter values, the results computed through the robust methodology of choice. We show the information gain that monitoring provides in the study of complex data structures through the analysis of multivariate datasets using different high-breakdown techniques. Our findings support the claim that the principle of monitoring is very flexible and that it can lead to robust estimators that are as efficient as possible. We also address through simulation some of the tricky inferential issues that arise from monitoring.

[1]  Anthony C. Atkinson,et al.  A Parametric Framework for the Comparison of Methods of Very Robust Regression , 2014, 1405.5040.

[2]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[3]  Anthony C. Atkinson,et al.  Controlling the size of multivariate outlier tests with the MCD estimator of scatter , 2009, Stat. Comput..

[4]  Oleg A. Smirnov Computation of the Information Matrix for Models With Spatial Interaction on a Lattice , 2005 .

[5]  Anthony C. Atkinson,et al.  Robust Bayesian regression with the forward search: theory and data analysis , 2017 .

[6]  Domenico Perrotta,et al.  The Forward Search for Very Large Datasets , 2015 .

[7]  Alessio Farcomeni,et al.  Error rates for multivariate outlier detection , 2011, Comput. Stat. Data Anal..

[8]  Andrea Cerioli,et al.  Multivariate Outlier Detection With High-Breakdown Estimators , 2010 .

[9]  E. Ronchetti,et al.  Robust statistics: a selective overview and new directions , 2015 .

[10]  Anthony C. Atkinson,et al.  Cluster detection and clustering with random start forward searches , 2018 .

[11]  Brenton R. Clarke,et al.  An adaptive trimmed likelihood algorithm for identification of multivariate outliers , 2006 .

[12]  Bent Nielsen,et al.  Asymptotic Theory of Outlier Detection Algorithms for Linear Time Series Regression Models , 2016 .

[13]  Victor J. Yohai,et al.  Robust estimators for generalized linear models , 2014 .

[14]  Alessio Farcomeni,et al.  Strong consistency and robustness of the Forward Search estimator of multivariate location and scatter , 2014, J. Multivar. Anal..

[15]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[16]  Francesca Torti,et al.  On consistency factors and efficiency of robust S-estimators , 2014 .

[17]  Christophe Croux,et al.  Sparse least trimmed squares regression for analyzing high-dimensional large data sets , 2013, 1304.4773.

[18]  Alessio Farcomeni,et al.  Robust Methods for Data Reduction , 2015 .

[19]  Brian D. Ripley,et al.  Modern applied statistics with S, 4th Edition , 2002, Statistics and computing.

[20]  Stephane Traissac,et al.  Tools to Characterize Point Patterns: dbmss for R , 2015 .

[21]  P. Rousseeuw,et al.  Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices , 1991 .

[22]  Alfio Marazzi,et al.  A robust conditional maximum likelihood estimator for generalized linear models with a dispersion parameter , 2018, TEST.

[23]  Alfio Marazzi,et al.  Robust Estimators of the Generalized Log-Gamma Distribution , 2014, Technometrics.

[24]  Alfio Marazzi,et al.  Robust estimators for generalized linear models with a dispersion parameter , 2017, 1703.09626.

[25]  G. M. Tallis Elliptical and Radial Truncation in Normal Populations , 1963 .

[26]  Mia Hubert,et al.  Multivariate and functional classification using depth and distance , 2017, Adv. Data Anal. Classif..

[27]  Anthony C. Atkinson,et al.  The forward search: theory and data analysis , 2010 .

[28]  Luis Angel García-Escudero,et al.  A reweighting approach to robust clustering , 2017, Statistics and Computing.

[29]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[30]  Anthony C. Atkinson,et al.  Exploratory tools for clustering multivariate data , 2007, Comput. Stat. Data Anal..

[31]  Alessio Farcomeni,et al.  Wild adaptive trimming for robust estimation and cluster analysis , 2018, Scandinavian Journal of Statistics.

[32]  Luis Angel García-Escudero,et al.  Generalized Radius Processes for Elliptically Contoured Distributions , 2005 .

[33]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[34]  M. D. Martínez-Miranda,et al.  Computational Statistics and Data Analysis , 2009 .

[35]  Bent Nielsen,et al.  Corrigendum: Analysis of the forward search using some new results for martingales and empirical processes , 2016, Bernoulli.

[36]  P. L. Davies,et al.  Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices , 1987 .

[37]  A. Atkinson,et al.  Finding an unknown number of multivariate outliers , 2009 .

[38]  Anthony C. Atkinson,et al.  Fast calibrations of the forward search for testing multiple outliers in regression , 2007, Adv. Data Anal. Classif..

[39]  Kuldeep Kumar,et al.  Robust Statistics, 2nd edn , 2011 .

[40]  Anthony C. Atkinson,et al.  How to Marry Robustness and Applied Statistics , 2016 .

[41]  C. Croux,et al.  Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator , 1999 .

[42]  Todd Mitton Controlling for Size , 2015 .

[43]  David M. Rocke,et al.  The Distribution of Robust Distances , 2005 .

[44]  Anthony C. Atkinson,et al.  Exploring Multivariate Data with the Forward Search , 2004 .

[45]  Anthony C. Atkinson,et al.  Monitoring robust regression , 2014 .

[46]  Marco Riani,et al.  The Ordering of Spatial Data and the Detection of Multiple Outliers , 1999 .

[47]  T. Banerjee Exploring Multivariate Data With the Forward Search , 2006 .

[48]  G. Willems,et al.  Small sample corrections for LTS and MCD , 2002 .

[49]  Doug Martin,et al.  An Extension of a Method of Hardin and Rocke , with an Application to Multivariate Outlier Detection via the IRMCD Method of Cerioli , 2014 .

[50]  A. Bowman,et al.  A look at some data on the old faithful geyser , 1990 .

[51]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[52]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[53]  Christine H. Müller,et al.  High Breakdown Point and High Efficiency , 1997 .

[54]  M. Hubert,et al.  High-Breakdown Robust Multivariate Methods , 2008, 0808.0657.

[55]  Anthony C. Atkinson,et al.  Regression Diagnostics for Binomial Data from the Forward Search , 2001 .

[56]  Kenneth Portier,et al.  Robust Diagnostic Regression Analysis , 2002, Technometrics.

[57]  Silvia Salini,et al.  Reliable Robust Regression Diagnostics , 2016 .