A Novel Multilayer Data Clustering Framework based on Feature Selection and Modified K-Means Algorithm

With the rapid development of computer science and technology, the data analysis technique has been a hottest research area in the pattern recognition research community. Cluster analysis is an important step in data mining. For clustering, various multi-objective techniques are evolved, which can automatically partition the data. In this paper, we propose a novel multilayer data clustering framework based on feature selection and modified K-Means algorithm. To facilitate the clustering, the proposed algorithm selects a representative feature subset to reduce the dimension of the raw data set. Besides, the selected feature subset has fewer missing values than the raw data set, which may improve the cluster accuracy. Another unique property of the proposed algorithm is the use of partial distance strategy. The experimental analysis and simulation indicate the feasibility and robustness of our method, in the future, we plan to conduct more mathematical analysis to modify our algorithm to achieve better result.

[1]  Jian Zhang,et al.  An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO , 2014, Comput. Intell. Neurosci..

[2]  Haoxiang Wang,et al.  An Effective Image Representation Method Using Kernel Classification , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[4]  Xiaoyun Chen,et al.  Gene expression data clustering based on graph regularized subspace segmentation , 2014, Neurocomputing.

[5]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[6]  Xuelong Li,et al.  Spectral-Spatial Constraint Hyperspectral Image Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[7]  C. Skibola,et al.  Follicular lymphoma-protective HLA class II variants correlate with increased HLA-DQB1 protein expression , 2013, Genes and Immunity.

[8]  Hung T. Nguyen,et al.  Data Clustering Using Variants of Rapid Centroid Estimation , 2014, IEEE Transactions on Evolutionary Computation.

[9]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[10]  Po-Whei Huang,et al.  A size-insensitive integrity-based fuzzy c-means method for data clustering , 2014, Pattern Recognit..

[12]  Raveendran Paramesran,et al.  A hybrid approach for data clustering based on modified cohort intelligence and K-means , 2014, Expert Syst. Appl..

[13]  Yun Yang,et al.  HMM-based hybrid meta-clustering ensemble for temporal data , 2014, Knowl. Based Syst..

[14]  Suman Das,et al.  A Novel and Efficient Rough Set Based Clustering Technique for Gene Expression Data , 2014, 2014 2nd International Conference on Business and Information Management (ICBIM).

[15]  M. Krishnamoorthi,et al.  BHOHS: A Two Stage Novel Algorithm for Data Clustering , 2014, 2014 International Conference on Intelligent Computing Applications.

[16]  Tutut Herawan,et al.  MGR: An information theory based hierarchical divisive clustering algorithm for categorical data , 2014, Knowl. Based Syst..

[17]  Thierry Denux,et al.  Likelihood-based belief function: Justification and some extensions to low-quality data , 2014, Int. J. Approx. Reason..

[18]  Shu-Hao Chang,et al.  Analyzing Offshore Wind Power Patent Portfolios by Using Data Clustering , 2014 .

[19]  F Peyrin,et al.  Computer vision tools to optimize reconstruction parameters in x-ray in-line phase tomography , 2014, Physics in medicine and biology.

[20]  Xin Huang,et al.  A novel relearning approach for remote sensing image classification post-processing , 2014, 2014 IEEE Geoscience and Remote Sensing Symposium.