On an Optimization Problem in Robust Statistics

In this article, we consider a large class of computational problems in robust statistics that can be formulated as selection of optimal subsets of data based on some criterion function. To solve such problems, there are broadly two classes of algorithms available in the literature. One is based on purely random search, and the other is based on deterministically guided strategies. Though these methods can achieve satisfactory results in some specific examples, none of them can be used satisfactorily for a large class of similar problems either due to their very long expected waiting time to hit the true optimum or due to their failure to come out of a local optimum when they get trapped there. Here, we propose two probabilistic search algorithms, and under some conditions on the parameters of the algorithms, we establish the convergence of our algorithms to the true optimum. We also show some results on the probability of hitting the true optimum if the algorithms are run for a finite number of iterations. Finally, we compare the performance of our algorithms to some commonly available algorithms for computing some popular robust multivariate statistics using real datasets.

[1]  P. Rousseeuw,et al.  Bivariate location depth , 1996 .

[2]  R. Dobrushin Central Limit Theorem for Nonstationary Markov Chains. II , 1956 .

[3]  Mia Hubert,et al.  Recent developments in PROGRESS , 1997 .

[4]  P. Rousseeuw,et al.  Constructing the bivariate Tukey median , 1998 .

[5]  J. Tukey Mathematics and the Picturing of Data , 1975 .

[6]  Probal Chaudhuri,et al.  On The Use of Genetic Algorithm with Elitism in Robust and Nonparametric Multivariate Analysis , 2003 .

[7]  J RousseeuwPeter,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[8]  R. Randles A Distribution-Free Multivariate Sign Test Based on Interdirections , 1989 .

[9]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[11]  G. Reaven,et al.  An attempt to define the nature of chemical diabetes using a multidimensional analysis , 2004, Diabetologia.

[12]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[13]  Anthony C. Atkinson,et al.  Simulated Annealing for the detection of Multiple Outliers using least squares and least median of squares fittin , 1991 .

[14]  Clark F. Olson,et al.  An Approximation Algorithm for Least Median of Squares Regression , 1997, Inf. Process. Lett..

[15]  A. Shapiro Monte Carlo Sampling Methods , 2003 .

[16]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[17]  Josef Kittler,et al.  Contextual classification of multispectral pixel data , 1984, Image Vis. Comput..

[18]  Douglas M. Hawkins,et al.  The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data , 1994 .

[19]  P. Rousseeuw,et al.  High-dimensional computation of the deepest location , 2000 .

[20]  Probal Chaudhuri,et al.  Computation of half-space depth using simulated annealing , 2003, Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications.

[21]  Valentin Todorov Computing the Minimum Covariance Determinant Estimator (MCD) by simulated annealing , 1992 .

[22]  Hannu Oja,et al.  OPERATING TRANSFORMATION RETRANSFORMATION ON SPATIAL MEDIAN AND ANGLE TEST , 1998 .

[23]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[24]  O. Catoni Rough Large Deviation Estimates for Simulated Annealing: Application to Exponential Schedules , 1992 .

[25]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[26]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[27]  Gerhard Winkler,et al.  Image analysis, random fields and dynamic Monte Carlo methods: a mathematical introduction , 1995, Applications of mathematics.

[28]  Douglas M. Hawkins,et al.  The feasible set algorithm for least median of squares regression , 1993 .

[29]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[30]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[31]  Biman Chakraborty,et al.  On an adaptive transformation–retransformation estimate of multivariate location , 1998 .

[32]  P. Chaudhuri,et al.  On data depth and distribution-free discriminant analysis using separating surfaces , 2005 .

[33]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[34]  Anil K. Ghosh,et al.  On Maximum Depth and Related Classifiers , 2005 .

[35]  Arnold J. Stromberg,et al.  Computing the Exact Least Median of Squares Estimate and Stability Diagnostics in Multiple Linear Regression , 1993, SIAM J. Sci. Comput..

[36]  Ronald H. Randles,et al.  A Simpler, Affine-Invariant, Multivariate, Distribution-Free Sign Test , 2000 .

[37]  Kenneth Portier,et al.  Robust Diagnostic Regression Analysis , 2002, Technometrics.

[38]  R. Randles,et al.  A practical affine equivariant multivariate median , 2002 .

[39]  P. Chaudhuri,et al.  Sign Tests in Multidimension: Inference Based on the Geometry of the Data Cloud , 1993 .

[40]  H. Edelsbrunner,et al.  Computing Least Median of Squares Regression Lines and Guided Topological Sweep , 1990 .

[41]  Jan-Erik Roos,et al.  A mathematical introduction , 1986 .

[42]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[43]  P. Bocek,et al.  Linear programming approach to LMS-estimation , 1995 .

[44]  David M. Rocke,et al.  Heuristic Search Algorithms for the Minimum Volume Ellipsoid , 1993 .

[45]  Douglas M. Hawkins,et al.  Improved Feasible Solution Algorithms for High Breakdown Estimation , 1999 .

[46]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .