A survey on clustering in data mining

Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. Unsupervised learning (clustering) deals with which have not been pre classified in any way and so do not have a class attribute associated with them. The scope of applying clustering algorithm is to discover useful but unknown classes of items. Unsupervised learning is an approach of learning where instances are automatically placed into meaningful groups based on their similarity. This paper addresses fundamental concepts of unsupervised learning while it serveys recent clustering algorithm and their complexities.

[1]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2]  B. S. Duran,et al.  Cluster Analysis: A Survey , 1974 .

[3]  B. S. Duran,et al.  Cluster Analysis: A Survey , 1976 .

[4]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[5]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[6]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[7]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[8]  Daniel P. Fasulo,et al.  An Analysis of Recent Work on Clustering Algorithms , 1999 .

[9]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  Anil K. Jain,et al.  Clustering Methodologies in Exploratory Data Analysis , 1980, Adv. Comput..

[11]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[12]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[13]  Zhang Yi,et al.  Clustering Categorical Data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[14]  Le Gruenwald,et al.  A survey of data mining and knowledge discovery software tools , 1999, SKDD.

[15]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.