A Novel Auction-Based Query Pricing Schema

As a common processing method, query is widely used in many areas, such as graph processing, machine learning, statistics. However, queries are usually priced according to vendor-specified fixed views (API) or number of transactions, which ignores query heterogeneity(computing resource consumption for query and information that the answer brings) and violates the microeconomic principles. In this work we study the relational query pricing problem and design efficient auctions by taking into account both information (i.e., data) value and query resource consumption. Different from the existing query pricing schemes, query auction determines data prices that reflect the demand–supply of shared computing resources and information value (i.e., price discovery). We target query auction that runs in polynomial time and achieves near-optimal social welfare with a good approximation ratio, while elicits truthful bids from consumers. Towards these goals, we adapt the posted pricing framework in game-theoretic perspective by casting the query auction design into an Integer Linear Programming problem, and design a primal-dual algorithm to approximate the NP-hard optimization problem. Theoretical analysis and empirical studies driven by a real-world data market benchmark verify the efficiency of our query auction schema.

[1]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[2]  Raymond A. Lorie,et al.  XRM - An Extended (N-ary) Relational Memory , 1974, Research Report / G / IBM / Cambridge Scientific Center.

[3]  Zongpeng Li,et al.  Designing Truthful Spectrum Auctions for Multi-hop Secondary Networks , 2015, IEEE Transactions on Mobile Computing.

[4]  Kathryn S. McKinley,et al.  Evaluating the performance of distributed architectures for information retrieval using a variety of workloads , 2000, TOIS.

[5]  Zongpeng Li,et al.  Dynamic resource provisioning in cloud computing: A randomized auction approach , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[6]  Marc Bourreau,et al.  Pricing information goods: free vs. pay content , 2007 .

[7]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[8]  Shivnath Babu,et al.  Predicting completion times of batch query workloads using interaction-aware models and simulation , 2011, EDBT/ICDT '11.

[9]  Zongpeng Li,et al.  A truthful (1-ε)-optimal mechanism for on-demand cloud resource provisioning , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[10]  Eli Upfal,et al.  Performance prediction for concurrent database workloads , 2011, SIGMOD '11.

[11]  Carlo Curino,et al.  DBSeer: Resource and Performance Prediction for Building a Next Generation Database Cloud , 2013, CIDR.

[12]  Antonio Corradi,et al.  Heterogeneous cloud systems monitoring using semantic and linked data technologies , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[13]  Roger B. Myerson,et al.  Optimal Auction Design , 1981, Math. Oper. Res..

[14]  Yuguang Fang,et al.  Energy Consumption Optimization for Multihop Cognitive Cellular Networks , 2015, IEEE Transactions on Mobile Computing.

[15]  Zongpeng Li,et al.  An Online Auction Framework for Dynamic Resource Provisioning in Cloud Computing , 2016, IEEE/ACM Transactions on Networking.

[16]  Ashraf Aboulnaga,et al.  Automatic virtual machine configuration for database workloads , 2008, SIGMOD Conference.

[17]  Gerome Miklau,et al.  Pricing Aggregate Queries in a Data Marketplace , 2012, WebDB.

[18]  Chang Zhou,et al.  GiraphAsync: Supporting Online and Offline Graph Processing via Adaptive Asynchronous Message Processing , 2016, CIKM.

[19]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[20]  Kang G. Shin,et al.  Adaptive control of virtualized resources in utility computing environments , 2007, EuroSys '07.

[21]  Calton Pu,et al.  Intelligent management of virtualized resources for database systems in cloud environment , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[22]  ChengXiang Zhai,et al.  A study of Poisson query generation model for information retrieval , 2007, SIGIR.

[23]  H. Kellerer,et al.  Introduction to NP-Completeness of Knapsack Problems , 2004 .

[24]  Frank Linde,et al.  Pricing information goods , 2009 .

[25]  Zheng Zhang,et al.  Error-bounded Sampling for Analytics on Big Sparse Data , 2014, Proc. VLDB Endow..

[26]  Surajit Chaudhuri,et al.  Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques , 2012, Proc. VLDB Endow..

[27]  Archana Ganapathi,et al.  Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[28]  Jeffrey F. Naughton,et al.  Predicting query execution time: Are optimizer cost models really unusable? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[29]  Gustavo Alonso,et al.  Deployment of Query Plans on Multicores , 2014, Proc. VLDB Endow..

[30]  Dan Suciu,et al.  Query-Based Data Pricing , 2015, J. ACM.

[31]  Dan Suciu,et al.  A theory of pricing private data , 2012, ICDT '13.

[32]  Dan Suciu,et al.  Toward practical query pricing with QueryMarket , 2013, SIGMOD '13.

[33]  Dan Suciu,et al.  Data Markets in the Cloud: An Opportunity for the Database Community , 2011, Proc. VLDB Endow..

[34]  Jeffrey F. Naughton,et al.  Resource Bricolage for Parallel Database Systems , 2014, Proc. VLDB Endow..

[35]  Kathryn S. McKinley,et al.  Partial Collection Replication for Information Retrieval , 2003, Information Retrieval.