论文信息 - A Marketplace for Data: An Algorithmic Solution

A Marketplace for Data: An Algorithmic Solution

In this work, we aim to design a data marketplace; a robust real-time matching mechanism to efficiently buy and sell training data for Machine Learning tasks. While the monetization of data and pre-trained models is an essential focus of industry today, there does not exist a market mechanism to price training data and match buyers to sellers while still addressing the associated (computational and other) complexity. The challenge in creating such a market stems from the very nature of data as an asset: (i) it is freely replicable; (ii) its value is inherently combinatorial due to correlation with signal in other data; (iii) prediction tasks and the value of accuracy vary widely; (iv) usefulness of training data is difficult to verify a priori without first applying it to a prediction task. As our main contributions we: (i) propose a mathematical model for a two-sided data market and formally define the key associated challenges; (ii) construct algorithms for such a market to function and analyze how they meet the challenges defined. We highlight two technical contributions: (i) a new notion of "fairness" required for cooperative games with freely replicable goods; (ii) a truthful, zero regret mechanism to auction a class of combinatorial goods based on utilizing Myerson's payment function and the Multiplicative Weights algorithm. These might be of independent interest.

[1] L. Shapley,et al. VALUES OF LARGE GAMES. 6: EVALUATING THE ELECTORAL COLLEGE EXACTLY , 1962 .

[2] D. Bertsekas. Nondifferentiable optimization via approximation , 1975 .

[3] Roger B. Myerson,et al. Optimal Auction Design , 1981, Math. Oper. Res..

[4] Daniel P. Heyman,et al. Stochastic models in operations research , 1982 .

[5] J. Arthur. Stochastic Models in Operations Research, Volume II. Stochastic Optimization (Daniel P. Heyman and Matthew J. Sobel) , 1985 .

[6] L. Shapley. A Value for n-person Games , 1988 .

[7] Yuval Ishai,et al. Priced Oblivious Transfer: How to Sell Digital Goods , 2001, EUROCRYPT.

[8] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[9] J. Rochet,et al. Platform competition in two sided markets , 2003 .

[10] B. Caillaud,et al. Chicken & Egg: Competition Among Intermediation Service Providers , 2003 .

[11] J. Wolfers,et al. Prediction Markets , 2003 .

[12] Andreas Krause,et al. Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[13] Jianqing Chen,et al. Designing online auctions with past performance information , 2006, Decis. Support Syst..

[14] Evangelos Markakis,et al. Approximating power indices: theoretical and empirical analysis , 2010, Autonomous Agents and Multi-Agent Systems.

[15] H. Varian. Online Ad Auctions , 2009 .

[16] Daniel Gómez,et al. Polynomial calculation of the Shapley value based on sampling , 2009, Comput. Oper. Res..

[17] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18] Aaron Roth,et al. Selling privacy at auction , 2010, EC '11.

[19] Jeff A. Bilmes,et al. Online Submodular Minimization for Combinatorial Structures , 2011, ICML.

[20] Abhimanyu Das,et al. Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[21] R. Hanson. LOGARITHMIC MARKETS CORING RULES FOR MODULAR COMBINATORIAL INFORMATION AGGREGATION , 2012 .

[22] Aaron Roth,et al. Take It or Leave It: Running a Survey When Privacy Comes at a Cost , 2012, WINE.

[23] A. Goshtasby. Similarity and Dissimilarity Measures , 2012 .

[24] A. Ardeshir Goshtasby,et al. Image Registration , 2012, Advances in Computer Vision and Pattern Recognition.

[25] Renato Paes Leme,et al. Optimal mechanisms for selling information , 2012, EC '12.

[26] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[27] Talal Rahwan,et al. Bounding the Estimation Error of Sampling-based Shapley Value Approximation With/Without Stratifying , 2013, ArXiv.

[28] Yungao Ma,et al. The bullwhip effect on product orders and inventory: a perspective of demand forecasting techniques , 2013 .

[29] Aranyak Mehta,et al. Online Matching and Ad Allocation , 2013, Found. Trends Theor. Comput. Sci..

[30] Francis R. Bach,et al. Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[31] Renato D. Gomes,et al. Optimal auction design in two‐sided markets , 2014 .

[32] Weinan Zhang,et al. Optimal real-time bidding for display advertising , 2014, KDD.

[33] Constantinos Daskalakis,et al. Multi-item auctions defying intuition? , 2015, SECO.

[34] R. Johari,et al. Pricing in Ride-Share Platforms: A Queueing-Theoretic Approach , 2015 .

[35] Aaron Roth,et al. Accuracy for Sale: Aggregating Data with a Variance Constraint , 2015, ITCS.

[36] Carlos Riquelme,et al. Pricing in Ride-Sharing Platforms: A Queueing-Theoretic Approach , 2015, EC.

[37] M. Keith Chen,et al. Dynamic Pricing in a Labor Market: Surge Pricing and Flexible Work on the Uber Platform , 2016, EC.

[38] Constantinos Daskalakis,et al. Learning in Auctions: Regret is Hard, Envy is Easy , 2015, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[39] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[40] Dirk Bergemann,et al. The Design and Price of Information , 2016 .

[41] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[42] D. Bergemann,et al. The Design and Price of Information , 2016 .

[43] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[44] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[45] Sergei Vassilvitskii,et al. Statistical Cost Sharing , 2017, NIPS.

[46] Aleksander Madry,et al. Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.