Distributed Optimization and Data Market Design

We consider algorithms for distributed optimization and their applications. In this thesis, we propose a new approach for distributed optimization based on an emerging area of theoretical computer science – local computation algorithms. The approach is fundamentally different from existing methodologies and provides a number of benefits, such as robustness to link failure and adaptivity to dynamic settings. Specifically, we develop an algorithm, LOCO, that given a convex optimization problem P with n variables and a “sparse” linear constraint matrix with m constraints, provably finds a solution as good as that of the best online algorithm for P using only O(log(n + m)) messages with high probability. The approach is not iterative and communication is restricted to a localized neighborhood. In addition to analytic results, we show numerically that the performance improvements over classical approaches for distributed optimization are significant, e.g., it uses orders of magnitude less communication than ADMM. We also consider the operations of a geographically distributed cloud data market. We consider design decisions that include which data to purchase (data purchasing) and where to place or replicate the data for delivery (data placement). We show that a joint approach to data purchasing and data placement within a cloud data market improves operating costs. This problem can be viewed as a facility location problem, and is thus NP-hard. However, we give a provably optimal algorithm for the case of a data market consisting of a single data center, and then generalize the result from the single data center setting in order to develop a near-optimal, polynomial-time algorithm for a geo-distributed data market. The resulting design, Datum, decomposes the joint purchasing and placement problem into two subproblems, one for data purchasing and one for data placement, using a transformation of the underlying bandwidth costs. We show, via a case study, that Datum is near-optimal (within 1.6%) in practical settings.

[1]  Dan Suciu,et al.  Data Markets in the Cloud: An Opportunity for the Database Community , 2011, Proc. VLDB Endow..

[2]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[3]  Fernando Paganini,et al.  Internet congestion control , 2002 .

[4]  Laura I. Burke,et al.  A two-phase tabu search approach to the location routing problem , 1999, Eur. J. Oper. Res..

[5]  Yu-Han Lyu,et al.  Approximately optimal auctions for selling privacy when costs are correlated with data , 2012, EC '12.

[6]  Dorit S. Hochbaum,et al.  Heuristics for the fixed cost median problem , 1982, Math. Program..

[7]  Dan Suciu,et al.  Toward practical query pricing with QueryMarket , 2013, SIGMOD '13.

[8]  Jonathan P. How,et al.  Cooperative Distributed Robust Trajectory Optimization Using Receding Horizon MILP , 2011, IEEE Transactions on Control Systems Technology.

[9]  Daniel Pérez Palomar,et al.  Alternative Distributed Algorithms for Network Utility Maximization: Framework and Applications , 2007, IEEE Transactions on Automatic Control.

[10]  Paramvir Bahl,et al.  Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.

[11]  Jiawei Zhang Approximating the two-level facility location problem via a quasi-greedy approach , 2004, SODA '04.

[12]  Mung Chiang,et al.  Stochastic network utility maximisation - a tribute to Kelly's paper published in this journal a decade ago , 2008, Eur. Trans. Telecommun..

[13]  Lachlan L. H. Andrew,et al.  A tale of two metrics: simultaneous bounds on competitiveness and regret , 2013, SIGMETRICS '13.

[14]  Manfred Körkel On the exact solution of large-scale simple plant location problems , 1989 .

[15]  Joseph Naor,et al.  The Design of Competitive Online Algorithms via a Primal-Dual Approach , 2009, Found. Trends Theor. Comput. Sci..

[16]  J. Beasley Lagrangean heuristics for location problems , 1993 .

[17]  Michael J. Freedman,et al.  Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area , 2014, NSDI.

[18]  Donald Erlenkotter,et al.  A Dual-Based Procedure for Uncapacitated Facility Location , 1978, Oper. Res..

[19]  Dan Suciu,et al.  A Discussion on Pricing Relational Data , 2013, In Search of Elegance in the Theory and Practice of Computation.

[20]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[21]  Canbing Li,et al.  An Optimized EV Charging Model Considering TOU Price and SOC Curve , 2012, IEEE Transactions on Smart Grid.

[22]  Carlo Curino,et al.  WANalytics: Analytics for a Geo-Distributed Data-Intensive World , 2015, CIDR.

[23]  Carlo Curino,et al.  Global Analytics in the Face of Bandwidth and Regulatory Constraints , 2015, NSDI.

[24]  Masafumi Yamashita,et al.  Erratum: Distributed Anonymous Mobile Robots: Formation of Geometric Patterns , 2006, SIAM J. Comput..

[25]  Fan Yang,et al.  Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing , 2014, Proc. VLDB Endow..

[26]  Vahab S. Mirrokni,et al.  Local Computation of PageRank Contributions , 2007, WAW.

[27]  Frank Kelly,et al.  Rate control for communication networks: shadow prices, proportional fairness and stability , 1998, J. Oper. Res. Soc..

[28]  J.-C. Pesquet,et al.  A Douglas–Rachford Splitting Approach to Nonsmooth Convex Variational Signal Recovery , 2007, IEEE Journal of Selected Topics in Signal Processing.

[29]  Omer Reingold,et al.  New techniques and tighter bounds for local computation algorithms , 2014, J. Comput. Syst. Sci..

[30]  Vincent W. S. Wong,et al.  Optimal Real-Time Pricing Algorithm Based on Utility Maximization for Smart Grid , 2010, 2010 First IEEE International Conference on Smart Grid Communications.

[31]  Ufuk Topcu,et al.  Optimal decentralized protocol for electric vehicle charging , 2013 .

[32]  Diptesh Ghosh,et al.  Neighborhood search heuristics for the uncapacitated facility location problem , 2003, Eur. J. Oper. Res..

[33]  Rayadurgam Srikant,et al.  The Mathematics of Internet Congestion Control , 2003 .

[34]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[35]  Na Li,et al.  Optimal demand response based on utility maximization in power networks , 2011, 2011 IEEE Power and Energy Society General Meeting.

[36]  Dan Suciu,et al.  QueryMarket Demonstration: Pricing for Online Data Markets , 2012, Proc. VLDB Endow..

[37]  Alejandro Ribeiro,et al.  Consensus in Ad Hoc WSNs With Noisy Links—Part I: Distributed Estimation of Deterministic Signals , 2008, IEEE Transactions on Signal Processing.

[38]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[39]  Yi Guo,et al.  A distributed and optimal motion planning approach for multiple mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[40]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[41]  Leon S. Lasdon,et al.  Optimization Theory of Large Systems , 1970 .

[42]  Gabriele Steidl,et al.  Removing Multiplicative Noise by Douglas-Rachford Splitting Methods , 2010, Journal of Mathematical Imaging and Vision.

[43]  Michael E. Saks,et al.  Local Monotonicity Reconstruction , 2010, SIAM J. Comput..

[44]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[45]  Sem C. Borst,et al.  Distributed Caching Algorithms for Content Distribution Networks , 2010, 2010 Proceedings IEEE INFOCOM.

[46]  Ying Liao,et al.  Load-Balanced Clustering Algorithm With Distributed Self-Organization for Wireless Sensor Networks , 2013, IEEE Sensors Journal.

[47]  Mohammad Abdulrahman Al-Fawzan,et al.  A tabu search approach to the uncapacitated facility location problem , 1999, Ann. Oper. Res..

[48]  J. Krarup,et al.  The simple plant location problem: Survey and synthesis , 1983 .

[49]  Steven H. Low,et al.  Optimization flow control—I: basic algorithm and convergence , 1999, TNET.

[50]  Stephen P. Boyd,et al.  Distributed optimization for cooperative agents: application to formation flight , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[51]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[52]  A. Robert Calderbank,et al.  Layering as Optimization Decomposition: A Mathematical Theory of Network Architectures , 2007, Proceedings of the IEEE.

[53]  Laurent Massoulié,et al.  Bandwidth sharing: objectives and algorithms , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[54]  George B. Dantzig,et al.  Decomposition Principle for Linear Programs , 1960 .

[55]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[56]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[57]  Adam Wierman,et al.  Joint Data Purchasing and Data Placement in a Geo-Distributed Data Market , 2016, SIGMETRICS.

[58]  Aaron Roth,et al.  Accuracy for Sale: Aggregating Data with a Variance Constraint , 2015, ITCS.

[59]  J. F. Benders Partitioning procedures for solving mixed-variables programming problems , 1962 .

[60]  Harvey J. Everett Generalized Lagrange Multiplier Method for Solving Problems of Optimum Allocation of Resources , 1963 .