The Catline for Deep Regression

Motivated by the notion of regression depth (Rousseeuw and Hubert, 1996) we introduce thecatline, a new method for simple linear regression. At any bivariate data setZn={(xi,yi);i=1,?,n} its regression depth is at leastn/3. This lower bound is attained for data lying on a convex or concave curve, whereas for perfectly linear data the catline attains a depth ofn. We construct anO(nlogn) algorithm for the catline, so it can be computed fast in practice. The catline is Fisher-consistent at any linear modely=sx+?+ein which the error distribution satisfies med(e|x)=0, which encompasses skewed and/or heteroscedastic errors. The breakdown value of the catline is 1/3, and its influence function is bounded. At the bivariate gaussian distribution its asymptotic relative efficiency compared to theL1line is 79.3% for the slope, and 88.9% for the intercept. The finite-sample relative efficiencies are in close agreement with these values. This combination of properties makes the catline an attractive fitting method.

[1]  G. Wang,et al.  Convergence of depth contours for multivariate datasets , 1997 .

[2]  D. Ruppert,et al.  Transformation and Weighting in Regression , 1988 .

[3]  Regina Y. Liu Control Charts for Multivariate Processes , 1995 .

[4]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[5]  W. R. Buckland,et al.  Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. , 1952 .

[6]  J. H. Wilkinson Two algorithms based on successive linear interpolation , 1967 .

[7]  R. Koenker,et al.  Asymptotic Theory of Least Absolute Error Regression , 1978 .

[8]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[9]  Peter Henrici,et al.  Constructive aspects of the fundamental theorem of algebra : proceedings of a symposium conducted at the IBM Research Laboratory, Zürich-Rüschlikon, Switzerland, June 5-7, 1967 , 1972 .

[10]  David J. Hand,et al.  A Handbook of Small Data Sets , 1993 .

[11]  P. Rousseeuw,et al.  Sensitivity functions and numerical analysis of the repeated median slope , 1995 .

[12]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[13]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[14]  G. W. Brown,et al.  On Median Tests for Linear Hypotheses , 1951 .

[15]  Regina Y. Liu On a Notion of Data Depth Based on Random Simplices , 1990 .

[16]  Iain M. Johnstone,et al.  The Resistant Line and Related Regression Methods , 1985 .

[17]  D. Donoho,et al.  Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[18]  Raymond J. Carroll,et al.  A Note on Asymmetry and Robustness in Linear Regression , 1988 .

[19]  Brian S. Cade,et al.  PERMUTATION TESTS FOR LEAST ABSOLUTE DEVIATION REGRESSION , 1996 .

[20]  Richard Cole,et al.  Slowing down sorting networks to obtain faster sorting algorithms , 2015, JACM.

[21]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[22]  Herbert Edelsbrunner,et al.  Computing a Ham-Sandwich Cut in Two Dimensions , 1986, J. Symb. Comput..

[23]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[24]  J. Tukey Mathematics and the Picturing of Data , 1975 .