Improved Bisector pruning for uncertain data mining

Uncertain data mining is well studied and very challenging task. This paper is concentrated on clustering uncertain objects with location uncertainty. Uncertain locations are described by probability density function (PDF). Number of uncertain objects can be very large and obtaining quality result within reasonable time is a challenging task. Basic clustering method is UK-means, in which all expected distances (ED) from objects to clusters are calculated. Thus UK-means is inefficient. To avoid ED calculations various pruning methods are proposed. The pruning methods are significantly more effective than UK-means method. In this paper, Improved Bisector pruning method is proposed as an improvement of clustering process.

[1]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Edward Hung,et al.  An Efficient Distance Calculation Method for Uncertain Objects , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[4]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[5]  F. DEHNE,et al.  Voronoi trees and clustering problems , 1987, Inf. Syst..

[6]  David Wai-Lok Cheung,et al.  Clustering Uncertain Data Using Voronoi Diagrams and R-Tree Index , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Hans-Peter Kriegel,et al.  Hierarchical density-based clustering of uncertain data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[9]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[11]  Manabu Ichino,et al.  Generalized Minkowski metrics for mixed feature-type data analysis , 1994, IEEE Trans. Syst. Man Cybern..

[12]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[13]  David Wai-Lok Cheung,et al.  Clustering Uncertain Data Using Voronoi Diagrams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  Reynold Cheng,et al.  Uncertain Data Mining: An Example in Clustering Location Data , 2006, PAKDD.

[15]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[16]  A. Prasad Sistla,et al.  Updating and Querying Databases that Track Mobile Units , 1999, Distributed and Parallel Databases.

[17]  Enrique H. Ruspini,et al.  A New Approach to Clustering , 1969, Inf. Control..