QTCP: Adaptive Congestion Control with Reinforcement Learning

Next generation network access technologies and Internet applications have increased the challenge of providing satisfactory quality of experience for users with traditional congestion control protocols. Efforts on optimizing the performance of TCP by modifying the core congestion control method depending on specific network architectures or apps do not generalize well under a wide range of network scenarios. This limitation arises from the rule-based design principle, where the performance is linked to a pre-decided mapping between the observed state of the network to the corresponding actions. Therefore, these protocols are unable to adapt their behavior in new environments or learn from experience for better performance. We address this problem by integrating a reinforcement-based Q-learning framework with TCP design in our approach called QTCP. QTCP enables senders to gradually learn the optimal congestion control policy in an on-line manner. QTCP does not need hard-coded rules, and can therefore generalize to a variety of different networking scenarios. Moreover, we develop a generalized Kanerva coding function approximation algorithm, which reduces the computation complexity of value functions and the searchable size of the state space. We show that QTCP outperforms the traditional rule-based TCP by providing 59.5 percent higher throughput while maintaining low transmission latency.

[1]  Özgür B. Akan,et al.  State-of-the-art and research challenges for consumer wireless communications at 60 GHz , 2016, IEEE Transactions on Consumer Electronics.

[2]  Mo Dong,et al.  PCC: Re-architecting Congestion Control for Consistent High Performance , 2014, NSDI.

[3]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[4]  Sally Floyd,et al.  The NewReno Modification to TCP's Fast Recovery Algorithm , 2004, RFC.

[5]  Fan Zhou,et al.  Towards Fast Flow Convergence in Cognitive Radio Cellular Networks , 2017, GLOBECOM 2017 - 2017 IEEE Global Communications Conference.

[6]  Rong Zheng,et al.  Sequential Learning for Multi-Channel Wireless Network Monitoring With Channel Switching Costs , 2014, IEEE Transactions on Signal Processing.

[7]  Bin Cao,et al.  Cooperative Spectrum Sharing with Energy-Save in Cognitive Radio Networks , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[8]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[9]  Raj Jain,et al.  Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks , 1989, Comput. Networks.

[10]  Özgür B. Akan,et al.  A correlation‐based and spectrum‐aware admission control mechanism for multimedia streaming in cognitive radio sensor networks , 2017, Int. J. Commun. Syst..

[11]  Van Jacobson,et al.  BBR: Congestion-Based Congestion Control , 2016, ACM Queue.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Qian Zhang,et al.  A Compound TCP Approach for High-Speed and Long Distance Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[14]  Abdulsalam Yassine,et al.  An online learning approach to QoE-fair distributed rate allocation in multi-user video streaming , 2014, 2014 8th International Conference on Signal Processing and Communication Systems (ICSPCS).

[15]  Martin Allen,et al.  Reinforcement learning with adaptive Kanerva coding for Xpilot game AI , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[16]  Hari Balakrishnan,et al.  An experimental study of the learnability of congestion control , 2014, SIGCOMM.

[17]  Qinru Qiu,et al.  A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[18]  Lutz Frommberger Qualitative spatial abstraction in reinforcement learning , 2010 .

[19]  S. Whiteson,et al.  Adaptive Tile Coding for Value Function Approximation , 2007 .

[20]  Mihaela van der Schaar,et al.  Online Learning Based Congestion Control for Adaptive Multimedia Transmission , 2013, IEEE Transactions on Signal Processing.

[21]  Filip De Turck,et al.  A learning-based algorithm for improved bandwidth-awareness of adaptive streaming clients , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[22]  Özgür B. Akan,et al.  Cognitive radio sensor networks , 2009, IEEE Network.

[23]  Larry L. Peterson,et al.  TCP Vegas: End to End Congestion Avoidance on a Global Internet , 1995, IEEE J. Sel. Areas Commun..

[24]  Waleed Meleis,et al.  Function Approximation Using Tile and Kanerva Coding For Multi-Agent Systems , 2009 .

[25]  Cheng Wu,et al.  Adaptive Kanerva-based function approximation for multi-agent systems , 2008, AAMAS.

[26]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[27]  Jing Wang,et al.  A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs , 2017, 2017 IEEE International Conference on Communications (ICC).

[28]  Ian F. Akyildiz,et al.  TCP CRAHN: A Transport Control Protocol for Cognitive Radio Ad Hoc Networks , 2013, IEEE Transactions on Mobile Computing.

[29]  Cheng Jin,et al.  FAST TCP: Motivation, Architecture, Algorithms, Performance , 2006, IEEE/ACM Transactions on Networking.