CUDAP: A Novel Clustering Algorithm for Uncertain Data Based on Approximate Backbone

Clustering for uncertain data is an interesting research topic in data mining. Researchers prefer to define uncertain data clustering problem by using combinatorial optimization model. Heuristic clustering algorithm is an efficient way to deal with this kind of clustering problem, but initialization sensitivity is one of inevitable drawbacks. In this paper, we propose a novel clustering algorithm named CUDAP (Clustering algorithm for Uncertain Data based on Approximate backbone). In CUDAP, we (1) make M times random sampling on the original uncertain data set D m to generate M sampled data sets DS= { Ds 1 ,Ds 2 ,…,Ds M }; (2) capture the M local optimal clustering results P ={ C 1 ,C 2 ,…,C M } from DS by running UK-Medoids algorithm on each sample data set Ds i , i=1,…M ;  (3) design a greedy search algorithm to find out the approximate backbone( APB ) from P ; (4) run UK-Medoids again on the original uncertain data set D m guided by new initialization which was generated from APB . Experimental results on synthetic and real world data sets demonstrate the superiority of the proposed approach in terms of clustering quality measures.

[1]  Weixiong Zhang,et al.  Configuration landscape analysis and backbone guided local search: Part I: Satisfiability and maximum satisfiability , 2004, Artif. Intell..

[2]  Yufei Tao,et al.  Range search on multidimensional uncertain data , 2007, TODS.

[3]  Hu Yan,et al.  Backbone Analysis and Applications in Heuristic Algorithm Design , 2011 .

[4]  Stephen J. Redmond,et al.  A method for initialising the K-means clustering algorithm using kd-trees , 2007, Pattern Recognit. Lett..

[5]  Yu Zong,et al.  A Clustering Algorithm based on Local Accumulative Knowledge , 2013, J. Comput..

[6]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[7]  Chunyu Ren Heuristic Algorithm for Min-max Vehicle Routing Problems , 2012, J. Comput..

[8]  Yanchun Zhang,et al.  HC_AB: A new heuristic clustering algorithm based on Approximate Backbone , 2011, Inf. Process. Lett..

[9]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[10]  Tie Qiu,et al.  Backbone Analysis and Applications in Heuristic Algorithm Design: Backbone Analysis and Applications in Heuristic Algorithm Design , 2011 .

[11]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[12]  Andrea Tagarelli,et al.  Clustering Uncertain Data Via K-Medoids , 2008, SUM.

[13]  S. Kirkpatrick,et al.  Configuration space analysis of travelling salesman problems , 1985 .

[14]  Reynold Cheng,et al.  Uncertain Data Mining: An Example in Clustering Location Data , 2006, PAKDD.

[15]  Weixin Xie,et al.  An Efficient Global K-means Clustering Algorithm , 2011, J. Comput..

[16]  Li Ming Backbone Analysis and Applications in Heuristic Algorithm Design , 2011 .