A Scalable Execution Engine for Package Queries

Many modern applications and real-world problems involve the design of item collections, or packages: from planning your daily meals all the way to mapping the universe. Despite the pervasive need for packages, traditional data management does not offer support for their definition and computation. This is because traditional database queries follow a powerful, but very simple model: a query defines constraints that each tuple in the result must satisfy. However, a system tasked with the design of packages cannot consider items independently; rather, the system needs to determine if a set of items collectively satisfy given criteria. In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. First, we design PaQL, a SQL-based query language that supports the declarative specification of package queries. Second, we present a fundamental strategy for evaluating package queries that combines the capabilities of databases and constraint optimization solvers. The core of our approach is a set of translation rules that transform a package query to an integer linear program. Third, we introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. Fourth, we introduce SKETCHREFINE, an efficient and scalable algorithm for package evaluation, which offers strong approximation guarantees. Finally, we present extensive experiments over real-world data. Our results demonstrate that SKETCHREFINE is effective at deriving high-quality package results, and achieves runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.

[1]  Giovanni Rinaldi,et al.  A Branch-and-Cut Algorithm for the Resolution of Large-Scale Symmetric Traveling Salesman Problems , 1991, SIAM Rev..

[2]  Johannes Bisschop,et al.  AIMMS - Optimization Modeling , 2006 .

[3]  David P. Williamson,et al.  Primal-Dual Approximation Algorithms for Integral Flow and Multicut in Trees, with Applications to Matching and Set Cover , 1993, ICALP.

[4]  Dan Suciu,et al.  Tiresias: the database oracle for how-to queries , 2012, SIGMOD Conference.

[5]  Aditya G. Parameswaran,et al.  Recommendation systems with complex constraints: A course recommendation perspective , 2011, TOIS.

[6]  Gabriel M. Kuper,et al.  Constraint Query Languages , 1995, J. Comput. Syst. Sci..

[7]  Wenfei Fan,et al.  On the Complexity of Package Recommendation Problems , 2013 .

[8]  Alexandra Meliou,et al.  PackageBuilder: From Tuples to Packages , 2014, Proc. VLDB Endow..

[9]  Türkay Dereli,et al.  PROJECT TEAM SELECTION USING FUZZY OPTIMIZATION APPROACH , 2007, Cybern. Syst..

[10]  Stanley B. Zdonik,et al.  Searchlight: Enabling Integrated Search and Exploration over Large Multidimensional Data , 2015, Proc. VLDB Endow..

[11]  Cong Yu,et al.  Automatic construction of travel itineraries using social breadcrumbs , 2010, HT '10.

[12]  David P. Williamson,et al.  The Design of Approximation Algorithms , 2011 .

[13]  Andreas Krause,et al.  Distributed Submodular Maximization: Identifying Representative Elements in Massive Data , 2013, NIPS.

[14]  J. Gunn,et al.  The Sloan Digital Sky Survey , 1994, astro-ph/9412080.

[15]  Stanley B. Zdonik,et al.  Interactive data exploration using semantic windows , 2014, SIGMOD Conference.

[16]  Alexandra Meliou,et al.  Data X-Ray: A Diagnostic Tool for Data Errors , 2015, SIGMOD Conference.

[17]  Cong Yu,et al.  Constructing and exploring composite items , 2010, SIGMOD Conference.

[18]  Dimitrios Gunopulos,et al.  Efficient Approximation Of Optimization Queries Under Parametric Aggregation Constraints , 2003, VLDB.

[19]  Theodoros Lappas,et al.  Finding a team of experts in social networks , 2009, KDD.

[20]  Lav R. Varshney,et al.  Computational creativity for culinary recipes , 2014, CHI Extended Abstracts.

[21]  William J. Cook,et al.  On the Complexity of Branch and Cut Methods for the Traveling Salesman Problem , 1990, Polyhedral Combinatorics.

[22]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[23]  Alexandra Meliou,et al.  Scalable Package Queries in Relational Database Systems , 2015, Proc. VLDB Endow..