The $$k$$k-means algorithm for 3D shapes with an application to apparel design

Clustering of objects according to shapes is of key importance in many scientific fields. In this paper we focus on the case where the shape of an object is represented by a configuration matrix of landmarks. It is well known that this shape space has a finite-dimensional Riemannian manifold structure (non-Euclidean) which makes it difficult to work with. Papers about clustering on this space are scarce in the literature. The basic foundation of the $$k$$k-means algorithm is the fact that the sample mean is the value that minimizes the Euclidean distance from each point to the centroid of the cluster to which it belongs, so, our idea is integrating the Procrustes type distances and Procrustes mean into the $$k$$k-means algorithm to adapt it to the shape analysis context. As far as we know, there have been just two attempts in that way. In this paper we propose to adapt the classical $$k$$k-means Lloyd algorithm to the context of Shape Analysis, focusing on the three dimensional case. We present a study comparing its performance with the Hartigan-Wong $$k$$k-means algorithm, one that was previously adapted to the field of Statistical Shape Analysis. We demonstrate the better performance of the Lloyd version and, finally, we propose to add a trimmed procedure. We apply both to a 3D database obtained from an anthropometric survey of the Spanish female population conducted in this country in 2006. The algorithms presented in this paper are available in the Anthropometry R package, whose most current version is always available from the Comprehensive R Archive Network.

[1]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[2]  H. Le,et al.  ON THE CONSISTENCY OF PROCRUSTEAN MEAN SHAPES , 1998 .

[3]  B. Hambly Fractals, random shapes, and point fields , 1994 .

[4]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[5]  Francisco de A. T. de Carvalho,et al.  Selected Contributions in Data Analysis and Classification , 2007 .

[6]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[7]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[8]  C. Small The statistical theory of shape , 1996 .

[9]  Jintu Fan,et al.  Development of a new chinese bra sizing system based on breast anthropometric measurements , 2007 .

[10]  S. Hodge,et al.  Statistics and Probability , 1972 .

[11]  Douglas Steinley,et al.  K-means clustering: a half-century synthesis. , 2006, The British journal of mathematical and statistical psychology.

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  S. R. Jammalamadaka,et al.  Topics in Circular Statistics , 2001 .

[14]  Karla Peavy Simmons,et al.  Body shape analysis using three-dimensional body scanning technology , 2003 .

[15]  R. Bhattacharya,et al.  Nonparametic estimation of location and dispersion on Riemannian manifolds , 2002 .

[16]  H. Karcher Riemannian center of mass and mollifier smoothing , 1977 .

[17]  N. Fisher,et al.  Efficient Simulation of the von Mises Distribution , 1979 .

[18]  K. Mardia,et al.  Consistency of Procrustes Estimators , 1997 .

[19]  Guillermo Vinué,et al.  Anthropometry: An R Package for Analysis of Anthropometric Data , 2017 .

[20]  K. Nomizu,et al.  Foundations of Differential Geometry , 1963 .

[21]  W. Kendall Probability, Convexity, and Harmonic Maps with Small Image I: Uniqueness and Fine Existence , 1990 .

[22]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[23]  Vasile Georgescu,et al.  Clustering of Fuzzy Shapes by Integrating Procrustean Metrics and Full Mean Shape Estimation into K-Means Algorithm , 2009, IFSA/EUSFLAT Conf..

[24]  Hans-Hermann Bock,et al.  Clustering Methods: A History of k-Means Algorithms , 2007 .

[25]  Herman Chernoff,et al.  Metric considerations in cluster analysis , 1972 .

[26]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[27]  D. Kendall The diffusion of shape , 1977, Advances in Applied Probability.

[28]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[29]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Roger P. Woods,et al.  Characterizing volume and surface deformations in an atlas framework: theory, applications, and implementation , 2003, NeuroImage.

[31]  Guillermo Ayala,et al.  Apparel sizing using trimmed PAM and OWA operators , 2012, Expert Syst. Appl..

[32]  K. Mardia,et al.  Statistical Shape Analysis , 1998 .

[33]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[34]  R. Bhattacharya,et al.  LARGE SAMPLE THEORY OF INTRINSIC AND EXTRINSIC SAMPLE MEANS ON MANIFOLDS—II , 2003 .

[35]  Xavier Pennec,et al.  Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measurements , 2006, Journal of Mathematical Imaging and Vision.

[36]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[37]  Xiuwen Cai,et al.  Analysis of Alignment Influence on 3-D Anthropometric Statistics , 2005 .

[38]  David J. Hand,et al.  Short communication: Optimising k-means clustering results with standard software packages , 2005 .

[39]  M. Fréchet Les éléments aléatoires de nature quelconque dans un espace distancié , 1948 .

[40]  Kathleen M. Robinette,et al.  Sizing up Australia: The next step. Chapter 1: Report Summary , 2013 .

[41]  Julien Claude,et al.  Morphometrics with R , 2009 .

[42]  P. D. Polly,et al.  Geometric morphometrics: recent applications to the study of evolution and development , 2010 .

[43]  T. K. Carne,et al.  Shape and Shape Theory , 1999 .

[44]  Getúlio J. A. Amaral,et al.  k-Means Algorithm in Statistical Shape Analysis , 2010, Commun. Stat. Simul. Comput..

[45]  M. P. Sebastian,et al.  Improving the Accuracy and Efficiency of the k-means Clustering Algorithm , 2009 .

[46]  Mao-Jiun J. Wang,et al.  The development of sizing systems for Taiwanese elementary-and high-school students , 2007 .

[47]  Beatriz Nacher,et al.  Anthropometric Survey of the Spanish Female Population Aimed at the Apparel Industry , 2010 .

[48]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[49]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[50]  F. Rohlf Shape Statistics: Procrustes Superimpositions and Tangent Spaces , 1999 .