论文信息 - Integrated Framework Using Frequent Pattern for Clustering Numeric and Nominal Data Sets

Integrated Framework Using Frequent Pattern for Clustering Numeric and Nominal Data Sets

Clustering is an exploratory technique in data mining that aligns objects which have a maximum degree of similarity in the same group. The real-world data are usually mixed in nature, i.e., it can contain both numeric and nominal data. Performance degradation is a major challenge in existing mixed data clustering due to multiple iterations and increased complexities. We propose an integrated framework using frequent pattern analysis, frequent pattern-based framework for mixed data clustering (FPMC) algorithm, to cluster mixed data in a competent way by performing a one-time clustering along with attribute reduction. This algorithm comes under divide-and-conquer paradigm, with three phases, namely crack, transformation, and merging. The results are promising when the algorithm is applied on benchmark datasets.

M. V. Judy | Sreeja Ashok | Aswathy Asok | T. J. Jisha

[1] Yiming Ma,et al. Improving an Association Rule Based Classifier , 2000, PKDD.

[2] Rüdiger Wirth,et al. A New Algorithm for Faster Mining of Generalized Association Rules , 1998, PKDD.

[3] Joshua Zhexue Huang,et al. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[4] Ke Wang,et al. Top Down FP-Growth for Association Rule Mining , 2002, PAKDD.

[5] Lipika Dey,et al. A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[6] Zhexue Huang,et al. CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[7] Zengyou He,et al. Clustering Mixed Numeric and Categorical Data: A Cluster Ensemble Approach , 2005, ArXiv.

[8] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .