New robust dynamic plots for regression mixture detection

The forward search is a powerful general method for detecting multiple masked outliers and for determining their effect on inferences about models fitted to data. From the monitoring of a series of statistics based on subsets of data of increasing size we obtain multiple views of any hidden structure. One of the problems of the forward search has always been the lack of an automatic link among the great variety of plots which are monitored. Usually it happens that a lot of interesting features emerge unexpectedly during the progression of the forward search only when a specific combination of forward plots is inspected at the same time. Thus, the analyst should be able to interact with the plots and redefine or refine the links among them. In the absence of dynamic linking and interaction tools, the analyst risks to miss relevant hidden information. In this paper we fill this gap and provide the user with a set of new robust graphical tools whose power will be demonstrated on several regression problems. Through the analysis of real and simulated data we give a series of examples where dynamic interaction with different “robust plots” is used to highlight the presence of groups of outliers and regression mixtures and appraise the effect that these hidden groups exert on the fitted model.

[1]  Angel R. Martinez,et al.  : Exploratory data analysis with MATLAB ® , 2007 .

[2]  Edward R. Tufte,et al.  The Visual Display of Quantitative Information , 1986 .

[3]  A. Atkinson,et al.  Finding an unknown number of multivariate outliers , 2009 .

[4]  Anthony C. Atkinson,et al.  Fast calibrations of the forward search for testing multiple outliers in regression , 2007, Adv. Data Anal. Classif..

[5]  Gesellschaft für Klassifikation. Jahrestagung,et al.  Classification - the Ubiquitous Challenge, Proceedings of the 28th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Dortmund, March 9-11, 2004 , 2005, GfKl.

[6]  Francesca Torti,et al.  Detecting Price Outliers in European Trade Data with the Forward Search , 2010 .

[7]  Michael Friendly,et al.  Milestones in the History of Data Visualization: A Case Study in Statistical Historiography , 2004, GfKl.

[8]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[9]  Angel R. Martinez,et al.  Computational Statistics Handbook with MATLAB, Second Edition (Chapman & Hall/Crc Computer Science & Data Analysis) , 2007 .

[10]  Anthony C. Atkinson,et al.  Exploring Multivariate Data with the Forward Search , 2004 .

[11]  P. Fayers,et al.  The Visual Display of Quantitative Information , 1990 .

[12]  Perrotta Domenico,et al.  Fitting Mixtures of Regression Lines with the Forward Search , 2008 .

[13]  Kenneth Portier,et al.  Robust Diagnostic Regression Analysis , 2002, Technometrics.

[14]  Chun-Houh Chen,et al.  Handbook of Data Visualization (Springer Handbooks of Computational Statistics) , 2008 .

[15]  Piskorski Jakub,et al.  Mining Massive Data Sets for Security , 2008 .

[16]  Adalbert F. X. Wilhelm,et al.  Linked Views for Visual Exploration , 2008 .

[17]  Anthony C. Atkinson,et al.  Forward search added-variable t-tests and the effect of masked outliers on model selection , 2002 .

[18]  Catherine B. Hurley,et al.  Theory of Dynamic Projections in High-Dimensional Data Visualization , 2004 .