Learning-NUM: Network Utility Maximization With Unknown Utility Functions and Queueing Delay

Network Utility Maximization (NUM) studies the problems of allocating traffic rates to network users in order to maximize the users' total utility subject to network resource constraints. In this paper, we propose a new NUM framework, Learning-NUM, where the users' utility functions are unknown apriori and the utility function values of the traffic rates can be observed only after the corresponding traffic is delivered to the destination, which means that the utility feedback experiences \textit{queueing delay}. The goal is to design a policy that gradually learns the utility functions and makes rate allocation and network scheduling/routing decisions so as to maximize the total utility obtained over a finite time horizon $T$. In addition to unknown utility functions and stochastic constraints, a central challenge of our problem lies in the queueing delay of the observations, which may be unbounded and depends on the decisions of the policy. We first show that the expected total utility obtained by the best dynamic policy is upper bounded by the solution to a static optimization problem. Without the presence of feedback delay, we design an algorithm based on the ideas of gradient estimation and Max-Weight scheduling. To handle the feedback delay, we embed the algorithm in a parallel-instance paradigm to form a policy that achieves $\tilde{O}(T^{3/4})$-regret, i.e., the difference between the expected utility obtained by the best dynamic policy and our policy is in $\tilde{O}(T^{3/4})$. Finally, to demonstrate the practical applicability of the Learning-NUM framework, we apply it to three application scenarios including database query, job scheduling and video streaming. We further conduct simulations on the job scheduling application to evaluate the empirical performance of our policy.

[1]  R. Srikant,et al.  Scheduling Storms and Streams in the Cloud , 2015, SIGMETRICS.

[2]  Ohad Shamir,et al.  On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.

[3]  Mehryar Mohri,et al.  Optimistic Bandit Convex Optimization , 2016, NIPS.

[4]  Longbo Huang,et al.  Utility optimal scheduling in energy-harvesting networks , 2013, TNET.

[5]  Giuseppe Caire,et al.  Adaptive Video Streaming for Wireless Networks With Multiple Users and Helpers , 2013, IEEE Transactions on Communications.

[6]  Ahmad Khonsari,et al.  Content-aware rate allocation for efficient video streaming via dynamic network utility maximization , 2012, J. Netw. Comput. Appl..

[7]  Eytan Modiano,et al.  Dynamic power allocation and routing for time varying wireless networks , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[8]  Xiaohan Wei,et al.  Online Convex Optimization with Stochastic Constraints , 2017, NIPS.

[9]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[10]  Leandros Tassiulas,et al.  Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks , 1990, 29th IEEE Conference on Decision and Control.

[11]  Steven H. Low,et al.  Optimization flow control—I: basic algorithm and convergence , 1999, TNET.

[12]  Qingkai Liang,et al.  Network Utility Maximization in Adversarial Environments , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[13]  R. Srikant,et al.  Stochastic models of load balancing and scheduling in cloud computing clusters , 2012, 2012 Proceedings IEEE INFOCOM.

[14]  Ness B. Shroff,et al.  Forget the Deadline: Scheduling Interactive Applications in Data Centers , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[15]  Daniel Pérez Palomar,et al.  Alternative Distributed Algorithms for Network Utility Maximization: Framework and Applications , 2007, IEEE Transactions on Automatic Control.

[16]  E. Modiano,et al.  Fairness and Optimal Stochastic Control for Heterogeneous Networks , 2005, IEEE/ACM Transactions on Networking.

[17]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[18]  Georgios B. Giannakis,et al.  Bandit Convex Optimization for Scalable and Dynamic IoT Management , 2017, IEEE Internet of Things Journal.

[19]  Miron Livny,et al.  Multiclass Query Scheduling in Real-Time Database Systems , 1995, IEEE Trans. Knowl. Data Eng..

[20]  Ness B. Shroff,et al.  Utility maximization for communication networks with multipath routing , 2006, IEEE Transactions on Automatic Control.

[21]  Frank Kelly,et al.  Rate control for communication networks: shadow prices, proportional fairness and stability , 1998, J. Oper. Res. Soc..

[22]  Asuman E. Ozdaglar,et al.  A distributed Newton method for Network Utility Maximization , 2010, 49th IEEE Conference on Decision and Control (CDC).

[23]  Michael J. Freedman,et al.  SLAQ: quality-driven scheduling for distributed machine learning , 2017, SoCC.

[24]  András György,et al.  Online Learning under Delayed Feedback , 2013, ICML.

[25]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[26]  András György,et al.  Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms , 2016, AAAI.