论文信息 - Clusters, outliers, and regression: fixed point clusters

Clusters, outliers, and regression: fixed point clusters

Fixed point clustering is a new stochastic approach to cluster analysis. The definition of a single fixed point cluster (FPC) is based on a simple parametric model, but there is no parametric assumption for the whole dataset as opposed to mixture modeling and other approaches. An FPC is defined as a data subset that is exactly the set of non-outliers with respect to its own parameter estimators. This paper concentrates upon the theoretical foundation of FPC analysis as a method for clusterwise linear regression, i.e., the single clusters are modeled as linear regressions with normal errors. In this setup, fixed point clustering is based on an iteratively reweighted estimation with zero weight for all outliers. FPCs are non-hierarchical, but they may overlap and include each other. A specification of the number of clusters is not needed. Consistency results are given for certain mixture models of interest in cluster analysis. Convergence of a fixed point algorithm is shown. Application to a real dataset shows that fixed point clustering can highlight some other interesting features of datasets compared to maximum likelihood methods in the presence of deviations from the usual assumptions of model based cluster analysis.

Christian Hennig | C. Hennig

[1] Adrian E. Raftery,et al. How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[2] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .

[3] David David. Maximum likelihood estimates of the parameters of a mixture of two regression lines , 1974 .

[4] A. Raftery,et al. Detecting features in spatial point processes with clutter via model-based clustering , 1998 .

[5] Christian Hennig,et al. Fixed Point Clusters for Linear Regression: Computation and Comparison , 2002, J. Classif..

[6] Quantile estimation for a selected normal population , 2000 .

[7] W. DeSarbo,et al. A mixture likelihood approach for generalized linear models , 1995 .

[8] Christian Hennig,et al. Validating visual clusters in large datasets: fixed point clusters of spectral features , 2002 .

[9] Fitting redescending M-estimators in regression , 1990 .

[10] Werner A. Stahel,et al. Robust Statistics: The Approach Based on Influence Functions , 1987 .

[11] Lawrence M. Seiford,et al. Recent developments in dea : the mathematical programming approach to frontier analysis , 1990 .