Online Learning Based Congestion Control for Adaptive Multimedia Transmission

The increase of Internet application requirements, such as throughput and delay, has spurred the need for transport protocols with flexible transmission control. Current TCP congestion control adopts an Additive Increase Multiplicative Decrease (AIMD) algorithm that linearly increases or exponentially decreases the congestion window based on transmission acknowledgments. In this paper, we propose an AIMD-like media-aware congestion control that determines the optimal congestion window updating policy for multimedia transmission. The media-aware congestion control problem is formulated as a Partially Observable Markov Decision Process (POMDP), which maximizes the long-term expected quality of the received multimedia application. The solution of this POMDP problem gives a policy adapted to multimedia applications' characteristics (i.e., distortion impacts and delay deadlines of multimedia packets). Note that to obtain the optimal congestion policy, the sender requires the complete statistical knowledge of both multimedia traffic and the network environment, which may not be available in practice. Hence, an online reinforcement learning in the POMDP-based solution provides a powerful tool to accurately estimate the environment and to adapt the source to network variations on the fly. Simulation results show that the proposed online learning approach can significantly improve the received video quality while maintaining the responsiveness and TCP-friendliness of the congestion control in various network scenarios.

[1]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[2]  Kang-Won Lee,et al.  An integrated source coding and congestion control framework for video streaming in the Internet , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[3]  Mihaela van der Schaar,et al.  A Quality-Centric TCP-Friendly Congestion Control for Multimedia Transmission , 2009, IEEE Transactions on Multimedia.

[4]  Deepak Bansal,et al.  Binomial congestion control algorithms , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  M. Littman The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .

[7]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[8]  Mark Handley,et al.  Equation-based congestion control for unicast applications , 2000, SIGCOMM.

[9]  太田 正幸 Reinforcement learning with perceptual aliasing , 2005 .

[10]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[11]  Jianping Pan,et al.  Performance analysis of TCP-friendly AIMD algorithms for multimedia applications , 2005, IEEE Transactions on Multimedia.

[12]  Sally Floyd,et al.  Promoting the use of end-to-end congestion control in the Internet , 1999, TNET.

[13]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[14]  Mihaela van der Schaar,et al.  Cross-Layer Packetization and Retransmission Strategies for Delay-Sensitive Wireless Multimedia Transmission , 2007, IEEE Transactions on Multimedia.

[15]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[16]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[17]  Mihaela van der Schaar,et al.  Multi-user video streaming over multi-hop wireless networks: A distributed, cross-layer approach based on priority queuing , 2007, IEEE Journal on Selected Areas in Communications.

[18]  Philip A. Chou,et al.  Rate-distortion optimized streaming of packetized media , 2006, IEEE Transactions on Multimedia.

[19]  Prashant J. Shenoy,et al.  Multimedia streaming via TCP: an analytic performance study , 2004, MULTIMEDIA '04.

[20]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[21]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[22]  R. A. McCallum First Results with Utile Distinction Memory for Reinforcement Learning , 1992 .

[23]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[24]  Mihaela van der Schaar,et al.  Structural Solutions for Dynamic Scheduling in Wireless Multimedia Transmission , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Mario Gerla,et al.  Adaptive video streaming: pre-encoded MPEG-4 with bandwidth scaling , 2004, Comput. Networks.

[26]  Deborah Estrin,et al.  RAP: An end-to-end rate-based congestion control mechanism for realtime streams in the Internet , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[27]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[28]  Weihong Zhang,et al.  Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[29]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .