Scale-invariant clustering with minimum volume ellipsoids

This paper develops theory and algorithms concerning a new metric for clustering data. The metric minimizes the total volume of clusters, where the volume of a cluster is defined as the volume of the minimum volume ellipsoid (MVE) enclosing all data points in the cluster. This metric is scale-invariant, that is, the optimal clusters are invariant under an affine transformation of the data space. We introduce the concept of outliers in the new metric and show that the proposed method of treating outliers asymptotically recovers the data distribution when the data comes from a single multivariate Gaussian distribution. Two heuristic algorithms are presented that attempt to optimize the new metric. On a series of empirical studies with Gaussian distributed simulated data, we show that volume-based clustering outperforms well-known clustering methods such as k-means, Ward's method, SOM, and model-based clustering.

[1]  David West,et al.  A comparison of SOM neural network and hierarchical clustering methods , 1996 .

[2]  Stephen P. Boyd,et al.  Determinant Maximization with Linear Matrix Inequality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[3]  E. Barnes An algorithm for separating patterns by ellipsoids , 1982 .

[4]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[5]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[6]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[7]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[8]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[9]  G. W. Milligan,et al.  An algorithm for generating artificial test clusters , 1985 .

[10]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[11]  Ruben H. Zamar,et al.  Robust space transformations for distance-based operations , 2001, KDD '01.

[12]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[13]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[14]  Peng Sun,et al.  Computation of Minimum Volume Covering Ellipsoids , 2002, Oper. Res..

[15]  F. Marriott Optimization methods of cluster analysis , 1982 .

[16]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[17]  Arnulfo Perez,et al.  Robust parallel clustering algorithm for image segmentation , 1996, Other Conferences.

[18]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[19]  Y. L. Tong The multivariate normal distribution , 1989 .

[20]  Jean-Michel Jolion,et al.  Robust Clustering with Applications in Computer Vision , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  David M. Rocke,et al.  Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator , 2004, Comput. Stat. Data Anal..

[22]  Carla M. Santos-Pereira,et al.  Detection of Outliers in Multivariate Data: A Method Based on Clustering and Robust Estimators , 2002, COMPSTAT.

[23]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[24]  J. B. Rosen Pattern separation by convex programming , 1965 .