Fast Computation of Tukey Trimmed Regions and Median in Dimension p > 2

ABSTRACT Given data in , a Tukey κ-trimmed region is the set of all points that have at least Tukey depth κ w.r.t. the data. As they are visual, affine equivariant and robust, Tukey regions are useful tools in nonparametric multivariate analysis. While these regions are easily defined and interpreted, their practical use in applications has been impeded so far by the lack of efficient computational procedures in dimension p > 2. We construct two novel algorithms to compute a Tukey κ-trimmed region, a naïve one and a more sophisticated one that is much faster than known algorithms. Further, a strict bound on the number of facets of a Tukey region is derived. In a large simulation study the novel fast algorithm is compared with the naïve one, which is slower and by construction exact, yielding in every case the same correct results. Finally, the approach is extended to an algorithm that calculates the innermost Tukey region and its barycenter, the Tukey median. Supplementary materials for this article are available online.

[1]  Regina Y. Liu,et al.  Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh) , 1999 .

[2]  Xiaohui Liu Fast implementation of the Tukey depth , 2017, Comput. Stat..

[3]  Gleb A. Koshevoy,et al.  The Tukey Depth Characterizes the Atomic Measure , 2002 .

[4]  Kurt Hornik,et al.  R/GNU Linear Programming Kit Interface , 2015 .

[5]  Pavlo Mozharovskyi,et al.  Depth and Depth-Based Classification with R Package ddalpha , 2016, Journal of Statistical Software.

[6]  Pavlo Mozharovskyi Contributions to depth-based classification and computation of the Tukey depth , 2015 .

[7]  Frank Hsu,et al.  Knowledge Discovery , 2014, Encyclopedia of Social Network Analysis and Mining.

[8]  Regina Y. Liu On a Notion of Data Depth Based on Random Simplices , 1990 .

[9]  Tatjana Lange,et al.  Computing zonoid trimmed regions of dimension d>2 , 2009, Comput. Stat. Data Anal..

[10]  Patrick M. Thomas,et al.  Efficient Computation of , 1976 .

[11]  Miroslav Siman,et al.  Computing multiple-output regression quantile regions , 2012, Comput. Stat. Data Anal..

[12]  R. Dyckerhoff Data depths satisfying the projection property , 2004 .

[13]  Pavlo Mozharovskyi,et al.  Exact computation of the halfspace depth , 2014, Comput. Stat. Data Anal..

[14]  A. Azzalini,et al.  Statistical applications of the multivariate skew normal distribution , 2009, 0911.2093.

[15]  Mia Hubert,et al.  Multivariate functional outlier detection , 2015, Statistical Methods & Applications.

[16]  Linglong Kong,et al.  Quantile tomography: using quantiles with multivariate data , 2008, Statistica Sinica.

[17]  Xiaohui Liu,et al.  Some results on the computing of Tukey’s halfspace median , 2016, 1604.05927.

[18]  I-Cheng Yeh,et al.  Knowledge discovery on RFM model using Bernoulli sequence , 2009, Expert Syst. Appl..

[19]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[20]  Xiaohui Liu,et al.  Computing projection depth and its associated estimators , 2012, Statistics and Computing.

[21]  P. Rousseeuw,et al.  The Bagplot: A Bivariate Boxplot , 1999 .

[22]  G. Reaven,et al.  An attempt to define the nature of chemical diabetes using a multidimensional analysis , 2004, Diabetologia.

[23]  D. Paindaveine,et al.  Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth , 2010, 1002.4486.

[24]  K. Mosler,et al.  Zonoid trimming for multivariate distributions , 1997 .

[25]  Miroslav Siman,et al.  On directional multiple-output quantile regression , 2011, J. Multivar. Anal..

[26]  Ilya S. Molchanov,et al.  Multivariate risks and depth-trimmed regions , 2006, Finance Stochastics.

[27]  MozharovskyiPavlo,et al.  Classifying real-world data with the DDα-procedure , 2015 .

[28]  Y. Zuo Projection-based depth functions and associated medians , 2003 .

[29]  Joan Antoni Sellarès,et al.  Efficient computation of location depth contours by methods of computational geometry , 2003, Stat. Comput..

[30]  Regina Y. Liu,et al.  DD-Classifier: Nonparametric Classification Procedure Based on DD-Plot , 2012 .

[31]  Yijun Zuo,et al.  Smooth depth contours characterize the underlying distribution , 2010, J. Multivar. Anal..

[32]  Robert Serfling,et al.  Depth functions in nonparametric multivariate inference , 2003, Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications.

[33]  R. Serfling,et al.  General notions of statistical depth function , 2000 .

[34]  P. Rousseeuw,et al.  Constructing the bivariate Tukey median , 1998 .

[35]  J. Tukey Mathematics and the Picturing of Data , 1975 .

[36]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[37]  A. B. Yeh,et al.  Balanced Confidence Regions Based on Tukey’s Depth and the Bootstrap , 1997 .

[38]  Peter Rousseeuw,et al.  Computing location depth and regression depth in higher dimensions , 1998, Stat. Comput..

[39]  Germain Van Bever,et al.  Contributions to nonparametric and semiparametric inference based on statistical depth , 2013 .

[40]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[41]  Regina Y. Liu,et al.  A Quality Index Based on Data Depth and Multivariate Rank Tests , 1993 .

[42]  K. Mosler Depth Statistics , 2012, Encyclopedia of Image Processing.

[43]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[44]  P. Rousseeuw,et al.  Halfspace Depth and Regression Depth Characterize the Empirical Distribution , 1999 .

[45]  Davy Paindaveine,et al.  Computing multiple-output regression quantile regions from projection quantiles , 2011, Computational Statistics.

[46]  Pavlo Mozharovskyi,et al.  Depth-Based Classification and Calculation of Data Depth , 2015 .

[47]  Karl Mosler,et al.  Stochastic linear programming with a distortion risk constraint , 2014, OR Spectr..