Optimal Clustering: Genetic Constrained K-Means and Linear Programming Algorithms
暂无分享,去创建一个
OPTWIAL CLUSTERING: GENETIC CONSTRAINED K-MEANS AND LINEAR PROGRAMMING ALGORITHMS By Jianrnin Zhao, M.S. A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University. Virginia Commonwealth University, 2006 Major Director: Robert E. Johnson, Ph.D. Depaltment of Biostatistics Methods for determining clusters of data underspecified constraints have recently gained popularity. Although general constraints may be used, we focus on clustering methods with the constraint of a minimal cluster size. In this dissertation, we propose two constrained k-means algorithms: Lh~earProgramming Algoiithm (LPA) and Genetic Constrained K-means Algorithm (GCKA). Linear Pi-ogr,mming Algorithm modifies the k-means algorithm into a linearprog~-amrning pt-ol~lern wit11 constraints rccli~ir.ing that t';lcl~ cluste[have in orrnor-e subjzcts. xii In order to acheve an acceptable clustering solution, we run the algorithm with a large number of random sets of initial seeds, and choose the solution with minimal Root Mean Squared Error (RMSE) as our final solution for a given data set. We evaluate LPA with both generic data and simulated data and the results indicate that LPA can obtain a reasonable clustering solution. Genetic Constrained K-Means Algorithm (GCKA) hybridizes the Genetic Algorithm with a constrained k-means algorithm. We defme Selection Operator, Mutation Operator and Constrained K-means operator. Using finite Markov chain theory, we prove that the GCKA converges in probability to the global optimum. We test the algorithm with several datasets. The analysis shows that we can achieve a good clustering solution by carehlly choosing parameters such as population size, mutation probability and generation. We also propose a Bi-Nelder algorithm to search for an appropriate cluster number with minimal RMSE.