An Object-Oriented Framework for Robust Multivariate Analysis

Taking advantage of the S4 class system of the programming environment R, which facilitates the creation and maintenance of reusable and modular components, an object-oriented framework for robust multivariate analysis was developed. The framework resides in the packages robustbase and rrcov and includes an almost complete set of algorithms for computing robust multivariate location and scatter, various robust methods for principal component analysis as well as robust linear and quadratic discriminant analysis. The design of these methods follows common patterns which we call statistical design patterns in analogy to the design patterns widely used in software engineering. The application of the framework to data analysis as well as possible extensions by the development of new methods is demonstrated on examples which themselves are part of the package rrcov.

[1]  Peter Filzmoser,et al.  Robust statistic for the one-way MANOVA , 2010, Comput. Stat. Data Anal..

[2]  David L. Woodruff,et al.  Identification of Outliers in Multivariate Data , 1996 .

[3]  J RousseeuwPeter,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[4]  P. Filzmoser,et al.  Algorithms for Projection-Pursuit Robust Principal Component Analysis , 2007 .

[5]  PETER J. ROUSSEEUW,et al.  Computing LTS Regression for Large Data Sets , 2005, Data Mining and Knowledge Discovery.

[6]  Adrian E. Raftery,et al.  MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-Based Clustering , 2006 .

[7]  Mark A. Neerincx,et al.  Cognitive support: designing aiding to supplement human knowledge , 1995, Int. J. Hum. Comput. Stud..

[8]  Marcos Dipinto,et al.  Discriminant analysis , 2020, Predictive Analytics.

[9]  S. J. Devlin,et al.  Robust Estimation of Dispersion Matrices and Principal Components , 1981 .

[10]  M. Hubert,et al.  High-Breakdown Robust Multivariate Methods , 2008, 0808.0657.

[11]  Douglas M. Hawkins,et al.  The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data , 1994 .

[12]  K. Gabriel,et al.  The biplot graphic display of matrices with application to principal component analysis , 1971 .

[13]  C. Croux,et al.  Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator , 1999 .

[14]  H. P. Lopuhaä On the relation between S-estimators and M-estimators of multivariate location and covariance , 1989 .

[15]  Louis-Paul Rivest,et al.  A robust biplot , 1992 .

[16]  Mia Hubert,et al.  Fast and robust discriminant analysis , 2004, Comput. Stat. Data Anal..

[17]  Jason E. Robbins,et al.  Cognitive support features for software development tools , 1999 .

[18]  C. Croux,et al.  Principal Component Analysis Based on Robust Estimators of the Covariance or Correlation Matrix: Influence Functions and Efficiencies , 2000 .

[19]  Ricardo A. Maronna,et al.  Principal Components and Orthogonal Regression Based on Robust Scales , 2005, Technometrics.

[20]  N. M. Neykov,et al.  Robust Selection of Variables in the Discriminant Analysis Based on MVE and MCD Estimators , 1990 .

[21]  Valentin Todorov Computing the Minimum Covariance Determinant Estimator (MCD) by simulated annealing , 1992 .

[22]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .

[23]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[24]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[25]  D. G. Simpson,et al.  Robust principal component analysis for functional data , 1999 .

[26]  Arnold Stromberg,et al.  Why Write Statistical Software? The Case of Robust Statistical Methods , 2004 .

[27]  Gérard Antille,et al.  Stability of robust and non-robust principal components analysis , 1990 .

[28]  N. Campbell Robust Procedures in Multivariate Analysis I: Robust Covariance Estimation , 1980 .

[29]  G. Willems,et al.  A robust Hotelling test , 2002 .

[30]  N. M. Neykov,et al.  Robust two-group discrimination by bounded influence regression: a Monte Carlo simulation , 1994 .

[31]  V. Moulin,et al.  Abstract , 2004, Veterinary Record.

[32]  P. J. Rousseeuw,et al.  Integrating a high-breakdown option into discriminant analysis in exploration geochemistry , 1992 .

[33]  W. Stahel Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen , 1981 .

[34]  Mia Hubert,et al.  Fast cross-validation of high-breakdown resampling methods for PCA , 2007, Comput. Stat. Data Anal..

[35]  C. Croux,et al.  Robust linear discriminant analysis using S‐estimators , 2001 .

[36]  Douglas M. Bates,et al.  Programming With Data: A Guide to the S Language , 1999, Technometrics.

[37]  José Agulló Candela Exact Iterative Computation of the Multivariate Minimum Volume Ellipsoid Estimator with a Branch and Bound Algorithm , 1996 .

[38]  Douglas M. Hawkins,et al.  High-Breakdown Linear Discriminant Analysis , 1997 .

[39]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[40]  Stephan Morgenthaler,et al.  A survey of robust statistics , 2007, Stat. Methods Appl..

[41]  Guoying Li,et al.  Projection-Pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory and Monte Carlo , 1985 .

[42]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[43]  Christophe Croux,et al.  A Fast Algorithm for Robust Principal Components Based on Projection Pursuit , 1996 .

[44]  Gabor Grothendieck,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[45]  Victor J. Yohai,et al.  The Behavior of the Stahel-Donoho Robust Multivariate Estimator , 1995 .

[46]  Max Jacobson,et al.  A Pattern Language: Towns, Buildings, Construction , 1981 .

[47]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[48]  D. Ruppert Computing S Estimators for Regression and Multivariate Location/Dispersion , 1992 .

[49]  M. Jhun,et al.  Asymptotics for the minimum covariance determinant estimator , 1993 .

[50]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[51]  G. Reaven,et al.  An attempt to define the nature of chemical diabetes using a multidimensional analysis , 2004, Diabetologia.

[52]  Ruben H. Zamar,et al.  Robust Estimates of Location and Dispersion for High-Dimensional Datasets , 2002, Technometrics.

[53]  Peter J. Rousseeuw,et al.  Robust Distances: Simulations and Cutoff Values , 1991 .

[54]  V. Yohai,et al.  A Fast Algorithm for S-Regression Estimates , 2006 .

[55]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[56]  W. Fung,et al.  High Breakdown Estimation for Multiple Populations with Applications to Discriminant Analysis , 2000 .

[57]  G. Willems,et al.  Small sample corrections for LTS and MCD , 2002 .

[58]  P. L. Davies,et al.  Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices , 1987 .

[59]  Jason E. Robbins,et al.  Cognitive support, UML adherence, and XMI interchange in Argo/UML , 2000, Inf. Softw. Technol..

[60]  David M. Rocke Robustness properties of S-estimators of multivariate location and shape in high dimension , 1996 .

[61]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[62]  David M. Rocke,et al.  Computable Robust Estimation of Multivariate Location and Shape in High Dimension Using Compound Estimators , 1994 .