From stars to galaxies: skyline queries on aggregate data

The skyline operator extracts relevant records from multidimensional databases according to multiple criteria. This operator has received a lot of attention because of its ability to identify the best records in a database without requiring to specify complex parameters like the relative importance of each criterion. However, it has only been defined with respect to single records, while one fundamental functionality of database query languages is aggregation, enabling operations over sets of records. In this paper we introduce aggregate skylines, where the skyline works as a filtering predicate on sets of records. This operator can be used to express queries in the form: return the best groups depending on the features of their elements, and thus provides a powerful combination of grouping and skyline functionality. We define a semantics for aggregate skylines based on a sound theoretical framework and study its computational complexity. We propose efficient algorithms to implement this operator and test them on real and synthetic data, showing that they outperform a direct SQL implementation of up to two orders of magnitude.

[1]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[2]  Jiawei Han,et al.  The Multi-Relational Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Ping Wu,et al.  Aggregate Skyline: Analysis for Online Users , 2009, 2009 Ninth Annual International Symposium on Applications and the Internet.

[4]  Richard J. Lipton,et al.  Regret-minimizing representative databases , 2010, Proc. VLDB Endow..

[5]  Christian Böhm,et al.  SkyDist: Data Mining on Skyline Objects , 2010, PAKDD.

[6]  Bin Jiang,et al.  Ranking uncertain sky: The probabilistic top-k skyline operator , 2011, Inf. Syst..

[7]  Marlene Goncalves,et al.  Fuzzy Dominance Skyline Queries , 2007, DEXA.

[8]  Katja Hose,et al.  A survey of skyline processing in highly distributed environments , 2011, The VLDB Journal.

[9]  Jiawei Han,et al.  Mining Thick Skylines over Large Databases , 2004, PKDD.

[10]  François Rousselot,et al.  Skyline Adaptive Fuzzy Query , 2011, KES.

[11]  Arnab Bhattacharya,et al.  Aggregate Skyline Join Queries: Skylines with Aggregate Operations over Multiple Relations , 2010, COMAD.

[12]  Xiang Lian,et al.  Reverse skyline search in uncertain databases , 2008, TODS.

[13]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Dimitris Sacharidis,et al.  Probabilistic contextual skylines , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[15]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[16]  Man Lung Yiu,et al.  Measuring the Sky: On Computing Data Cubes via Skylining the Measures , 2012, IEEE Transactions on Knowledge and Data Engineering.

[17]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[18]  Bernhard Seeger,et al.  Efficient Computation of Reverse Skyline Queries , 2007, VLDB.

[19]  Gautam Das,et al.  On Skyline Groups , 2012, IEEE Transactions on Knowledge and Data Engineering.

[20]  Cyrus Shahabi,et al.  The spatial skyline queries , 2006, VLDB.

[21]  Ping Wu,et al.  MOOLAP: Towards Multi-Objective OLAP , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[22]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).