Pangolin: speeding up concurrent messaging for cloud-based social gaming

The convergence of games and online social platforms is an exploding phenomena. The continued success of social games hinges critically on the ability to deliver smooth and highly-interactive experiences to end-users. However, it is extremely challenging to satisfy the stringent performance requirements of online social games. Motivated by an Xbox Live online social gaming application, we address the problem of concurrent messaging, where the maximum latency of game messages has to be tightly bounded. Learning from a large-scale measurement experiment, we conclude that the generic transport protocol TCP, currently being used in the game, cannot ensure concurrent messaging. We develop a new UDP-based transport protocol, named Pangolin. The core of Pangolin is an adaptive decision making engine derived from the Markov Decision Process theory. The engine optimally controls the transmission of redundant Forward Error Correction packets to combat data loss. Trace-driven emulation demonstrates that Pangolin reduces the 99.9-percentile latency from more than 4 seconds to about 1 second with negligible overhead. Pangolin pre-computes all optimal actions and requires only simple table look-up during online operation. Pangolin has been incorporated into the latest Xbox SDK - released in November, 2010 - and is now powering concurrent messaging for hundreds of thousands of Xbox clients.

[1]  Jean C. Bolot,et al.  The Case for FEC-based Error Control for Packet Audio in the Internet , 1997 .

[2]  Kenneth P. Birman,et al.  Maelstrom: Transparent Error Correction for Lambda Networks , 2008, NSDI.

[3]  Hao Jiang,et al.  Passive estimation of TCP round-trip times , 2002, CCRV.

[4]  Michael Luby,et al.  A digital fountain approach to reliable distribution of bulk data , 1998, SIGCOMM '98.

[5]  TowsleyDon,et al.  Parity-based loss recovery for reliable multicast transmission , 1997 .

[6]  Amit Agarwal,et al.  An argument for increasing TCP's initial congestion window , 2010, CCRV.

[7]  Vern Paxson,et al.  On estimating end-to-end network path properties , 2001, SIGCOMM LA '01.

[8]  Jeffrey Considine,et al.  Informed content delivery across adaptive overlay networks , 2002, IEEE/ACM Transactions on Networking.

[9]  Sneha Kumar Kasera,et al.  Improving reliable multicast using active parity encoding services (APES) , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[10]  Craig Partridge,et al.  Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication , 2000, SIGCOMM 2000.

[11]  Sneha Kumar Kasera,et al.  Improving reliable multicast using active parity encoding services , 2004, Comput. Networks.

[12]  Baochun Li,et al.  How Practical is Network Coding? , 2006, 200614th IEEE International Workshop on Quality of Service.

[13]  Manish Jain,et al.  Hybrid Window and Rate Based Congestion Control for Delay Sensitive Applications , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[14]  Injong Rhee,et al.  FEC-based loss recovery for interactive video transmission , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[15]  Johnson M. Hart,et al.  Windows System Programming , 2004 .

[16]  Janardhan R. Iyengar,et al.  Low Extra Delay Background Transport (LEDBAT) , 2012, RFC.

[17]  K. Jain,et al.  Practical Network Coding , 2003 .

[18]  Pedro F. Miret,et al.  Wikipedia , 2008, Monatsschrift für Deutsches Recht.

[19]  Christos Gkantsidis,et al.  Network coding for large scale content distribution , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[20]  Marcel Dischinger,et al.  Monarch: a tool to emulate transport protocol flowsover the internet at large , 2006, IMC '06.

[21]  Amar Phanishayee,et al.  Ricochet: Lateral Error Correction for Time-Critical Multicast , 2007, NSDI.

[22]  Philip A. Chou,et al.  Rate-distortion optimized streaming of packetized media , 2006, IEEE Transactions on Multimedia.

[23]  G. Cox,et al.  ~ " " " ' l I ~ " " -" . : -· " J , 2006 .

[24]  David Mazières,et al.  Rateless Codes and Big Downloads , 2003, IPTPS.

[25]  Don Towsley,et al.  Parity-based loss recovery for reliable multicast transmission , 1998, SIGCOMM '97.

[26]  Donald F. Towsley,et al.  Adaptive FEC-based error control for Internet telephony , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[27]  Wenjie Wang,et al.  Live streaming performance of the Zattoo network , 2009, IMC '09.

[28]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[29]  Mark Allman,et al.  Estimating loss rates with TCP , 2003, PERV.

[30]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[31]  Aleksandar Kuzmanovic,et al.  Removing exponential backoff from TCP , 2008, CCRV.

[32]  Vern Paxson,et al.  On estimating end-to-end network path properties , 2001, SIGCOMM LA '01.