We describe empirical work in the domain of clustering and outlier detection, for the analysis of European trade data. It is our first attempt to evaluate benefits and limitations of the forward search approach for regression and multivariate analysis Atkinson and Riani (Robust diagnostic regression analysis, Springer, 2000), Atkinson et al. (Exploring multivariate data with the forward search, Springer, 2004), within a concrete application scenario and in relation to a comparable backward method developed in the JRC by Arsenis et al. (Price outliers in eu external trade data, Enlargement and Integration Workshop 2005, 2005). Our findings suggest that the automatic clustering based on Mahalanobis distances may be inappropriate in presence of a high-density area in the dataset. Follow up work is discussed extensively in Riani et al. (Fitting mixtures of regression lines with the forward search, Mining massive data sets for security, IOS, 2008).
[1]
David A. Belsley,et al.
Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity
,
1981
.
[2]
Anthony C. Atkinson,et al.
Exploring Multivariate Data with the Forward Search
,
2004
.
[3]
Maurizio Vichi,et al.
Data Analysis, Classification and the Forward Search
,
2006
.
[4]
Marco Riani,et al.
Random Start Forward Searches with Envelopes for Detecting Clusters in Multivariate Data
,
2006
.
[5]
Piskorski Jakub,et al.
Mining Massive Data Sets for Security
,
2008
.
[6]
Anthony C. Atkinson,et al.
Robust Diagnostic Regression Analysis
,
2000
.
[7]
Perrotta Domenico,et al.
Fitting Mixtures of Regression Lines with the Forward Search
,
2008
.