Pseudo-centroid clustering

Pseudo-centroid clustering replaces the traditional concept of a centroid expressed as a center of gravity with the notion of a pseudo-centroid (or a coordinate free centroid) which has the advantage of applying to clustering problems where points do not have numerical coordinates (or categorical coordinates that are translated into numerical form). Such problems, for which classical centroids do not exist, are particularly important in social sciences, marketing, psychology and economics, where distances are not computed from vector coordinates but rather are expressed in terms of characteristics such as affinity relationships, psychological preferences, advertising responses, polling data and market interactions, where distances, broadly conceived, measure the similarity (or dissimilarity) of characteristics, functions or structures. We formulate a K-PC algorithm analogous to a K-Means algorithm and focus on two key types of pseudo-centroids, MinMax-centroids and (weighted) MinSum-centroids, and describe how they, respectively, give rise to a K-MinMax algorithm and a K-MinSum algorithm which are analogous to a K-Means algorithm. The K-PC algorithms are able to take advantage of problem structure to identify special diversity-based and intensity-based starting methods to generate initial pseudo-centroids and associated clusters, accompanied by theorems for the intensity-based methods that establish their ability to obtain best clusters of a selected size from the points available at each stage of construction. We also introduce a regret-threshold PC algorithm that modifies the K-PC algorithm together with an associated diversification method and a new criterion for evaluating the quality of a collection of clusters.

[1]  K. R. Sudha,et al.  Fuzzy C-Means clustering for robust decentralized load frequency control of interconnected power system with Generation Rate Constraint , 2012 .

[2]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[3]  Deok-Soo Kim,et al.  A Tabu Search Algorithm using the Voronoi Diagram for the Capacitated Vehicle Routing Problem , 2007, 2007 International Conference on Computational Science and its Applications (ICCSA 2007).

[4]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[5]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[6]  Fred W. Glover,et al.  Tabu Search for Nonlinear and Parametric Optimization (with Links to Genetic Algorithms) , 1994, Discret. Appl. Math..

[7]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Fred W. Glover,et al.  A tabu search algorithm for cohesive clustering problems , 2015, J. Heuristics.

[9]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[10]  Fred W. Glover,et al.  Clustering of Microarray data via Clique Partitioning , 2005, J. Comb. Optim..

[11]  Mario Inostroza-Ponta,et al.  Clustering Nodes in Large-Scale Biological Networks Using External Memory Algorithms , 2011, ICA3PP.

[12]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[13]  H. Ralambondrainy,et al.  A conceptual version of the K-means algorithm , 1995, Pattern Recognit. Lett..

[14]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[15]  Ickjai Lee,et al.  Fast spatial clustering with different metrics and in the presence of obstacles , 2001, GIS '01.

[16]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[17]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[18]  Shamkant B. Navathe,et al.  Knowledge mining by imprecise querying: a classification-based approach , 1992, [1992] Eighth International Conference on Data Engineering.

[19]  Niina Päivinen Clustering with a minimum spanning tree of scale-free-like structure , 2005, Pattern Recognit. Lett..

[20]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[21]  Fred W. Glover,et al.  A Template for Scatter Search and Path Relinking , 1997, Artificial Evolution.

[22]  Jeongyeon Seo,et al.  A Tabu Search Algorithm using the Voronoi Diagram for the Capacitated Vehicle Routing Problem , 2007 .

[23]  Sameem Abdul Kareem,et al.  An Adaptive Fuzzy Regression Model for the Prediction of Dichotomous Response Variables , 2007 .

[24]  Bo Fan,et al.  A hybrid spatial data clustering method for site selection: The data driven approach of GIS mining , 2009, Expert Syst. Appl..

[25]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[26]  Y Xu,et al.  Minimum spanning trees for gene expression data clustering. , 2001, Genome informatics. International Conference on Genome Informatics.

[27]  Fred Glover,et al.  Creating balanced and connected clusters to improve service delivery routes in logistics planning , 2010 .