Mining Thick Skylines over Large Databases

People recently are interested in a new operator, called skyline [3], which returns the objects that are not dominated by any other objects with regard to certain measures in a multi-dimensional space. Recent work on the skyline operator [3,15,8,13,2] focuses on efficient computation of skylines in large databases. However, such work gives users only thin skylines, i.e., single objects, which may not be desirable in some real applications. In this paper, we propose a novel concept, called thick skyline, which recommends not only skyline objects but also their nearby neighbors within -distance. Efficient computation methods are developed including (1) two efficient algorithms, Sampling-and-Pruning and Indexing-and-Estimating, to find such thick skyline with the help of statistics or indexes in large databases, and (2) a highly efficient Microcluster-based algorithm for mining thick skyline. The Microcluster-based method not only leads to substantial savings in computation but also provides a cocise representation of the thick skyline in the case of high cardinalities. Our experimental performance study shows that the proposed methods are both efficient and effective.

[1]  Jirí Matousek,et al.  Computing Dominances in E^n , 1991, Inf. Process. Lett..

[2]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[3]  Ivan Stojmenovic,et al.  An optimal parallel algorithm for solving the maximal elements problem in the plane , 1988, Parallel Comput..

[4]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[5]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Daniel A. Keim,et al.  Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.

[7]  Frank Nielsen,et al.  Output-Sensitive Peeling of Convex and Maximal Layers , 1996, Inf. Process. Lett..

[8]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[9]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[10]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[11]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[12]  Kenneth L. Clarkson,et al.  Fast linear expected-time algorithms for computing maxima and convex hulls , 1993, SODA '90.

[13]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[14]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[15]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[16]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.