On-line sampling-based control for network queueing problems

This thesis proposes novel on-line sampling algorithms for control in (possibly partially observable) Markov decision processes (MDPs). We emplay a receding horizon control framework. In this framework, we select a fixed sampling horizon and obtain an approximately optimal current action for that sampling horizon, taking that action at each decision time. We first discuss two distinguished previous efforts in this direction; a sampled look-ahead tree by Kearns et al. and the rollout algorithm by Bertsekas and Castanon, and then we propose two sampling-based control techniques called “parallel rollout” and “hindsight optimization”. Parallel rollout is a generalization of the Bertsekas rollout algorithm, and hindsight optimization is motivated by Ginsberg's Monte Carlo card play algorithm for computer bridge. In parallel rollout, we start with a small set of simple heuristic base policies that we wish to combine in an online fashion to generate a single controller. The approach yields a policy that is provably no worse at each state than the best of the base policies at that state. In hindsight optimization, the utility of taking an action is upper bounded by the average over many sampled traces of the (possibly discounted) reward sum of taking the action and then following the trace-relative optimal plan for the remaining horizon. The action with the highest utility upper bound is taken at each decision time. The utility estimate by hindsight optimization is an upperbound on the true utility whereas the estimate by parallel rollout is a lowerbound. As a “proof of concept” of parallel rollout and hindsight optimization, we formulate two resource allocation problems that arise in the telecommunication network area by partially observable MDPs: a buffer management problem and a multiclass packet scheduling problem with deadlines. The key feature of these two approaches is that, using our techniques, a given or learned stochastic model of network traffic can be effectively incorporated beneficially and tractably in making on-line network control decisions. We compare well-known non-sampling control policies and previously published sampling-based techniques with our proposed approaches, and show that our approaches improve on several known alternatives using empirical results based on simulated traffic.

[1]  Gang Wu,et al.  Congestion control via online sampling , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[2]  W. Whitt,et al.  A Source traffic model and its transient analysis for network control , 1998 .

[3]  Andreas D. Bovopoulos,et al.  The effect of delayed feedback information on network performance , 1992, Ann. Oper. Res..

[4]  Henryk Wozniakowski,et al.  The Monte Carlo Algorithm With a Pseudorandom Generator , 1992 .

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  Jr. Shaler Stidham Optimal control of admission to a queueing system , 1985 .

[8]  David Blackwell,et al.  Positive dynamic programming , 1967 .

[9]  Benjamin Van Roy,et al.  Approximate Dynamic Programming via Linear Programming , 2001, NIPS.

[10]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[11]  Ren Asmussen,et al.  Fitting Phase-type Distributions via the EM Algorithm , 1996 .

[12]  David M. Lucantoni,et al.  New results for the single server queue with a batch Markovian arrival process , 1991 .

[13]  K. Waldmann,et al.  Optimal control of arrivals to multiserver queues in a random environment , 1984 .

[14]  Vishal Misra,et al.  A hierarchical model for teletraffic , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[15]  Aurel A. Lazar,et al.  On the existence of equilibria in noncooperative optimal flow control , 1995, JACM.

[16]  Steven A. Lippman,et al.  Applying a New Device in the Optimization of Exponential Queuing Systems , 1975, Oper. Res..

[17]  Shlomo Zilberstein,et al.  Reinforcement Learning for Mixed Open-loop and Closed-loop Control , 1996, NIPS.

[18]  Robert Givan,et al.  Model-based Random Early Packet Dropping , .

[19]  Ger Koole Stochastic scheduling and dynamic programming , 1995 .

[20]  E. Altman Constrained Markov Decision Processes , 1999 .

[21]  B. Sikdar,et al.  Network Management and Control Using Collaborative On-line Simulation , 2001 .

[22]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[23]  Matthew L. Ginsberg,et al.  GIB: Steps Toward an Expert-Level Bridge-Playing Program , 1999, IJCAI.

[24]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[25]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[26]  Jay H. Lee,et al.  Model predictive control: past, present and future , 1999 .

[27]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[28]  Ke Liu,et al.  Nonhomogeneous Markov Decision Processes with Borel State Space-The Average Criterion with Nonuniformly Bounded Rewards , 2000, Math. Oper. Res..

[29]  Sally Floyd,et al.  Promoting the use of end-to-end congestion control in the Internet , 1999, TNET.

[30]  Eman Salaheddin Hashem,et al.  Analysis of Random Drop for Gateway Congestion Control , 1989 .

[31]  Yishay Mansour,et al.  Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[32]  W. N. Patten,et al.  A sliding horizon feedback control problem with feedforward and disturbance , 1997 .

[33]  Kang G. Shin,et al.  Adaptive packet marking for providing differentiated services in the Internet , 1998, Proceedings Sixth International Conference on Network Protocols (Cat. No.98TB100256).

[34]  Régis Sabbadin,et al.  Possibilistic Markov decision processes , 2001 .

[35]  Toshihide Ibaraki,et al.  A Solvable Case of the One-Machine Scheduling Problem with Ready and Due Times , 1978, Oper. Res..

[36]  Dimitri P. Bertsekas,et al.  Rollout Algorithms for Stochastic , 1998 .

[37]  SRIDHAR MAHADEVAN,et al.  Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.

[38]  Eitan Altman,et al.  Closed-loop control with delayed information , 1992, SIGMETRICS '92/PERFORMANCE '92.

[39]  E. Chong,et al.  Stochastic optimization of regenerative systems using infinitesimal perturbation analysis , 1994, IEEE Trans. Autom. Control..

[40]  T. V. Lakshman,et al.  SRED: stabilized RED , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[41]  Steven H. Low,et al.  REM: active queue management , 2001, IEEE Network.

[42]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[43]  Benjamin Melamed,et al.  An Overview of Tes Processes and Modeling Methodology , 1993, Performance/SIGMETRICS Tutorials.

[44]  Anthony Ephremides,et al.  Optimal scheduling with strict deadlines , 1989 .

[45]  M. Hofri Analysis of Algorithms: Computational Methods & Mathematical Tools , 1995 .

[46]  Anwar Elwalid,et al.  Fluid models for the analysis and design of statistical multiplexing with loss priorities on multiple classes of bursty traffic , 1992, [Proceedings] IEEE INFOCOM '92: The Conference on Computer Communications.

[47]  F. Vakil,et al.  Flow control protocols for integrated networks with partially observed voice traffic , 1987 .

[48]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[49]  Jonathan S. Turner Maintaining high throughput during overload in ATM switches , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[50]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM 1989.

[51]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the multiple node case , 1993, IEEE INFOCOM '93 The Conference on Computer Communications, Proceedings.

[52]  Wolfgang Fischer,et al.  The Markov-Modulated Poisson Process (MMPP) Cookbook , 1993, Perform. Evaluation.

[53]  Tetsuya Takine,et al.  Packet Loss Performance of Selective Cell Discard Schemes in ATM Switches , 1997, IEEE J. Sel. Areas Commun..

[54]  M. Littman The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .

[55]  Sally Floyd,et al.  Dynamics of TCP Traffic over ATM Networks , 1995, IEEE J. Sel. Areas Commun..

[56]  M. Neuts A Versatile Markovian Point Process , 1979 .

[57]  Bo Friis Nielsen,et al.  An application of superpositions of two state Markovian source to the modelling of self-similar behaviour , 1997, Proceedings of INFOCOM '97.

[58]  Scott Shenker,et al.  Uniform versus priority dropping for layered video , 1998, SIGCOMM '98.

[59]  Maurizio Casoni,et al.  On the Performance of Early Packet Discard , 1997, IEEE J. Sel. Areas Commun..

[60]  Anja Feldmann,et al.  Characteristics of TCP Connection Arrivals , 2002 .

[61]  Ari Arapostathis,et al.  On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes , 1991, Ann. Oper. Res..

[62]  Donald F. Towsley,et al.  Optimal scheduling policies for a class of queues with customer deadlines to the beginning of service , 1988, JACM.

[63]  E. Altman,et al.  On submodular value functions and complex dynamic programming , 1998 .

[64]  H. Michalska,et al.  Receding horizon control of nonlinear systems , 1988, Proceedings of the 28th IEEE Conference on Decision and Control,.

[65]  J. Lasserre,et al.  An on-line procedure in discounted infinite-horizon stochastic optimal control , 1986 .

[66]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[67]  Robert Givan,et al.  On-line Scheduling via Sampling , 2000, AIPS.

[68]  O. Hernández-Lerma,et al.  A forecast horizon and a stopping rule for general Markov decision processes , 1988 .

[69]  Thomas Parisini,et al.  Neural approximators and team theory for dynamic routing: a receding-horizon approach , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[70]  Walter Willinger,et al.  Experimental queueing analysis with long-range dependent packet traffic , 1996, TNET.

[71]  Martin May,et al.  Analytic evaluation of RED performance , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[72]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[73]  Basil S. Maglaris,et al.  Models for packet switching of variable-bit-rate video sources , 1989, IEEE J. Sel. Areas Commun..

[74]  Shaler Stidham,et al.  A survey of Markov decision models for control of networks of queues , 1993, Queueing Syst. Theory Appl..

[75]  Donald F. Towsley,et al.  Sample path methods in the control of queues , 1995, Queueing Syst. Theory Appl..

[76]  John Rust Using Randomization to Break the Curse of Dimensionality , 1997 .

[77]  J. M. Moore An n Job, One Machine Sequencing Algorithm for Minimizing the Number of Late Jobs , 1968 .

[78]  A. Madansky Inequalities for Stochastic Linear Programming Problems , 1960 .

[79]  R. Weber,et al.  Optimal control of service rates in networks of queues , 1987, Advances in Applied Probability.

[80]  Eugene L. Lawler On Scheduling Problems with Deferral Costs , 1964 .

[81]  B. Hajek Optimal control of two interacting service stations , 1982, 1982 21st IEEE Conference on Decision and Control.

[82]  S. Ross Arbitrary State Markovian Decision Processes , 1968 .

[83]  David A. McAllester,et al.  Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.

[84]  Sartaj Sahni,et al.  Algorithms for Scheduling Independent Tasks , 1976, J. ACM.

[85]  Daphne Koller,et al.  Policy Iteration for Factored MDPs , 2000, UAI.

[86]  J. M. Moore,et al.  A Functional Equation and its Application to Resource Allocation and Sequencing Problems , 1969 .

[87]  Eric Allender,et al.  Complexity of finite-horizon Markov decision process problems , 2000, JACM.

[88]  David W. Petr,et al.  Optimal packet discarding: an ATM-oriented analysis model and initial results , 1990, Proceedings. IEEE INFOCOM '90: Ninth Annual Joint Conference of the IEEE Computer and Communications Societies@m_The Multiple Facets of Integration.

[89]  Robert L. Bulfin,et al.  Scheduling a Single Machine to Minimize the Weighted Number of Tardy Jobs , 1983 .

[90]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[91]  Olivier Bonaventure,et al.  A RED discard strategy for ATM networks and its performance evaluation with TCP/IP traffic , 1999, CCRV.

[92]  Jon M. Peha,et al.  Evaluating scheduling algorithms for traffic with heterogeneous performance objectives , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[93]  San-Qi Li,et al.  Overload control in a finite message storage buffer , 1988, IEEE INFOCOM '88,Seventh Annual Joint Conference of the IEEE Computer and Communcations Societies. Networks: Evolution or Revolution?.

[94]  Apostolos Burnetas,et al.  Computing Optimal Policies for Markovian Decision Processes Using Simulation , 1995, Probability in the Engineering and Informational Sciences.

[95]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[96]  Awi Federgruen,et al.  Detection of minimal forecast horizons in dynamic programs with multiple indicators of the future , 1996 .

[97]  Pierre L'Ecuyer,et al.  Efficiency improvement and variance reduction , 1994, Proceedings of Winter Simulation Conference.

[98]  Robert Givan,et al.  A framework for simulation-based network control via hindsight optimization , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[99]  Deborah Estrin,et al.  Pricing in computer networks: motivation, formulation, and example , 1993, TNET.

[100]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[101]  W. A. van den Broek,et al.  Moving horizon control in dynamic games , 2002 .

[102]  Jon M. Peha,et al.  Heterogeneous-criteria scheduling: Minimizing weighted number of tardy jobs and weighted completion time , 1995, Comput. Oper. Res..

[103]  Aurel A. Lazar,et al.  Optimal Decentralized Flow Control of Markovian Queueing Networks with Multiple Controllers , 1991, Perform. Evaluation.

[104]  Bruce Hajek,et al.  ON CAUSAL SCHEDULING OF MULTICLASSTRAFFIC WITH DEADLINESBruce , 1998 .

[105]  R. Strauch Negative Dynamic Programming , 1966 .

[106]  W. Turin Fitting probabilistic automata via the em algorithm , 1996 .

[107]  L. Sennott Stochastic Dynamic Programming and the Control of Queueing Systems , 1998 .

[108]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[109]  David W. Petr,et al.  Nested threshold cell discarding for ATM overload control: optimization under cell loss constraints , 1991, IEEE INFCOM '91. The conference on Computer Communications. Tenth Annual Joint Comference of the IEEE Computer and Communications Societies Proceedings.

[110]  Robert Givan,et al.  Scheduling Multiclass Packet Streams to Minimize Weighted Loss , 2002, Queueing Syst. Theory Appl..

[111]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[112]  Aleksandar Kolarov,et al.  A control-theoretic approach to the design of an explicit rate controller for ABR service , 1999, TNET.

[113]  Bruce L. Miller,et al.  A Queueing Reward System with Several Customer Classes , 1969 .

[114]  E. Gilbert,et al.  Optimal infinite-horizon feedback laws for a general class of constrained discrete-time systems: Stability and moving-horizon approximations , 1988 .

[115]  S. Lam,et al.  Congestion Control of Store-and-Forward Networks by Input Buffer Limits - An Analysis , 1979, IEEE Transactions on Communications.

[116]  Bruce Hajek,et al.  Lex-Optimal Multiclass Scheduling with Deadlines , 2000 .

[117]  Ronald L. Rardin,et al.  An Overview of Complexity Theory in Discrete Optimization: Part II. Results and Implications , 1982 .

[118]  Van Jacobson,et al.  Link-sharing and resource management models for packet networks , 1995, TNET.

[119]  Robert Tappan Morris,et al.  Dynamics of random early detection , 1997, SIGCOMM '97.

[120]  Lixia Zhang,et al.  A new architecture for packet switching network protocols , 1989 .

[121]  Xi-Ren Cao,et al.  Perturbation analysis of discrete event dynamic systems , 1991 .

[122]  Konstantinos Psounis,et al.  CHOKe - a stateless active queue management scheme for approximating fair bandwidth allocation , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).