论文信息 - Knowledge-Conscious Data Clustering

Knowledge-Conscious Data Clustering

We consider the problem of efficiently executing data clustering queries in a client-server setting. Extant solutions to this problem suffer from (a) a significant amount of remote I/O and (b) minimal re-use of computation between both iterations of a kMeans query, and executions of different kMeans queries. We propose to facilitate interactive kMeans clustering by employing a client-side knowledge-cache. This knowledge-cache is succinct and significantly reduces the amount of remote I/O needed during execution. Furthermore, it permits the re-use of computation, both within and between executions of the kMeans queries.

Srinivasan Parthasarathy | Amol Ghoting

[1] Philip S. Yu,et al. A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[2] Anil K. Jain,et al. Large-scale parallel data clustering , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[3] Srinivasan Parthasarathy,et al. Shared State for Distributed Interactive Data Mining Applications , 2002, Distributed and Parallel Databases.

[4] Andrew W. Moore,et al. Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[5] David J. DeWitt,et al. Using a knowledge cache for interactive discovery of association rules , 1999, KDD '99.