CENTRE-BASED HARD CLUSTERING ALGORITHMS FOR Y-STR DATA

This paper presents Centre-based hard clustering approaches for clustering Y-STR data. Two classical partitioning techniques: Centroid-based partitioning technique and Representative object-based partitioning technique are evaluated. The k-Means and the k-Modes algorithms are the fundamental algorithms for the centroid-based partitioning technique, whereas the k-Medoids is a representative object- based partitioning technique. The three algorithms above are experimented and evaluated in partitioning Y-STR haplogroups and Y-STR Surname data. The overall results show that the centroid-based partitioning technique is better than the representative object-based partitioning technique in clustering Y- STR data.

[1]  Steven J. Phillips Acceleration of K-Means and Related Clustering Algorithms , 2002, ALENEX.

[2]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[3]  Doheon Lee,et al.  A k-populations algorithm for clustering categorical data , 2005, Pattern Recognit..

[4]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[5]  Michael K. Ng,et al.  On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Zengyou He,et al.  Attribute value weighting in k-modes clustering , 2011, Expert Syst. Appl..

[7]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[8]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[9]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[10]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[11]  Vance Faber,et al.  Clustering and the continuous k-means algorithm , 1994 .

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  Michael K. Ng,et al.  A new fuzzy k-modes clustering algorithm for categorical data , 2009, Int. J. Granul. Comput. Rough Sets Intell. Syst..

[14]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[15]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[16]  Zainab Abu Bakar,et al.  Centre-based clustering for Y-Short Tandem Repeats (Y-STR) as numerical and categorical data , 2010, 2010 International Conference on Information Retrieval & Knowledge Management (CAMP).