Outlier detection in interval data

A multivariate outlier detection method for interval data is proposed that makes use of a parametric approach to model the interval data. The trimmed maximum likelihood principle is adapted in order to robustly estimate the model parameters. A simulation study demonstrates the usefulness of the robust estimates for outlier detection, and new diagnostic plots allow gaining deeper insight into the structure of real world interval data.

[1]  Peter Filzmoser,et al.  Robust fitting of mixtures using the trimmed likelihood estimator , 2007, Comput. Stat. Data Anal..

[2]  P. Filzmoser A MULTIVARIATE OUTLIER DETECTION METHOD , 2004 .

[3]  Edwin Diday,et al.  Principal component analysis for interval‐valued observations , 2011, Stat. Anal. Data Min..

[4]  A. Hadi,et al.  Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms , 1997 .

[5]  G. Zararsiz,et al.  MVN: An R Package for Assessing Multivariate Normality , 2014, R J..

[6]  Monique Noirhomme-Fraiture,et al.  Symbolic Data Analysis and the SODAS Software , 2008 .

[7]  N. M. Neykov,et al.  About Regression Estimators with High Breakdown Point , 1998 .

[8]  Francisco de A. T. de Carvalho,et al.  Centre and Range method for fitting a linear regression model to symbolic interval data , 2008, Comput. Stat. Data Anal..

[9]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[10]  P. Brito,et al.  Modelling interval data with Normal and Skew-Normal distributions , 2012 .

[11]  Clemens Reimann,et al.  Multivariate outlier detection in exploration geochemistry , 2005, Comput. Geosci..

[12]  Przemyslaw Grzegorzewski,et al.  Distance-based linear discriminant analysis for interval-valued data , 2016, Inf. Sci..

[13]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[14]  Christine H. Müller,et al.  Breakdown Point and Computation of Trimmed Likelihood Estimators in Generalized Linear Models , 2003 .

[15]  L. Billard,et al.  Symbolic Covariance Principal Component Analysis and Visualization for Interval-Valued Data , 2012 .

[16]  G. Willems,et al.  Small sample corrections for LTS and MCD , 2002 .

[17]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[18]  A T de CarvalhoFrancisco de,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010 .

[19]  Andrea Cerioli,et al.  Multivariate Outlier Detection With High-Breakdown Estimators , 2010 .

[20]  Francisco de A. T. de Carvalho,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010, Comput. Stat. Data Anal..

[21]  L. Billard,et al.  Likelihood functions and some maximum likelihood estimators for symbolic data , 2008 .

[22]  G. Cordeiro,et al.  Bivariate symbolic regression models for interval-valued variables , 2011 .

[23]  Paula Brito,et al.  Symbolic Data Analysis: another look at the interaction of Data Mining and Statistics , 2014, WIREs Data Mining Knowl. Discov..

[24]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[25]  Paula Brito,et al.  Off the beaten track: A new linear model for interval data , 2017, Eur. J. Oper. Res..

[26]  M. Hubert,et al.  High-Breakdown Robust Multivariate Methods , 2008, 0808.0657.

[27]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[28]  Paula Brito,et al.  Discriminant Analysis of Interval Data: An Assessment of Parametric and Distance-Based Approaches , 2015, J. Classif..

[29]  Oleg A. Smirnov Computation of the Information Matrix for Models With Spatial Interaction on a Lattice , 2005 .

[30]  Sheau-Dong Lang,et al.  Detecting outliers in interval data , 2006, ACM-SE 44.

[31]  Yves Lechevallier,et al.  Partitional clustering algorithms for symbolic interval data based on single adaptive distances , 2009, Pattern Recognit..

[32]  Hans-Hermann Bock,et al.  Dynamic clustering for interval data based on L2 distance , 2006, Comput. Stat..

[33]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[34]  David M. Rocke,et al.  The Distribution of Robust Distances , 2005 .

[35]  Monique Noirhomme-Fraiture,et al.  Far beyond the classical data models: symbolic data analysis , 2011, Stat. Anal. Data Min..