A framework for improving the performance of application servers in next generation networks

Next generation networks (NGNs) such as IP Multimedia Subsystem (IMS) are completely built on the Internet Protocol (IP) suite. This has made IP the de facto standard for data networking, voice over IP (VoIP), and media rich applications such as streaming multimedia, ringtones, multi-player gaming, and high-definition video conferencing for remote interaction. A primary feature of such converged networks is that they use the same IP-based network for simultaneously delivering voice, video, and data. Such services are provided on application servers built using industry standard Advanced Telecom Computing Architecture (ATCA) based blade computing units with various flavors of commodity open source operating systems like Linux, xBSD, and OpenSolaris. However, real-time and latency sensitive applications such as streaming multimedia require that the entire network path of packet delivery from the originating server to the end host be properly and appropriately configured so as to avoid unnecessary delay and jitter in the data transfer mechanisms. With the ease of deployment comes the challenge of delivering such rich multimedia applications in NGNs since there exists no separate paths for voice and data as present in existing circuit-switched public switched telephone network (PSTN). Packet delivery in such converged architectures involves interaction between the storage disks, operating system (OS), network interface cards (NICs), and the various switches and routers—each of which is independently capable of introducting delay in the data transfer mechanism. In this dissertation, we focus on understanding and improving the performance of application servers present in high traffic content delivery networks (CDNs) and hoisting latency sensitive applications with heavy I/O requirements. We start by identifying an architectural framework for traffic characterization that is expected to provide insights about the composition and dynamics (e.g., average packet size and data rate, protocol composition) of network traffic present in CDNs. Once the nature and type of network traffic arriving at the NICs have been identified, we attempt to identify packet processing bottlenecks due to the interaction between the NICs, OS, and the underlying hardware. We propose a closed form queuing model that aims to understand the packet processing capabilities of the NICs based on the available computing resources. We have shown that there exist limits beyond which a computing unit cannot process packets without overloading the CPU. Since the performance of latency sensitive processes can be negatively impacted by delays of the storage network and by the dynamics of the OS, we present solutions for prioritizing the reader processes and tweaking the pagedaemon in open source OS. Based on our implementation in the NetBSD kernel, we have observed an approximate 15%–20% improvement in the transactions per second (TPS) of latency sensitive applications. Finally, we believe that our framework and approach of identifying the basic components in network data transfer mechanisms are for most part generic and can be used for performance tuning and deploying application servers in NGNs with a variety of different services.

[1]  Andy Currid,et al.  TCP Offload to the Rescue , 2004, ACM Queue.

[2]  Xiaoning Ding,et al.  DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality , 2005, FAST'05.

[3]  V. Jacobson,et al.  Congestion avoidance and control , 1988, CCRV.

[4]  Sang Lyul Min,et al.  A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references , 2000, OSDI.

[5]  Tai-Yi Huang,et al.  WRR-SCAN: a rate-based real-time disk-scheduling algorithm , 2004, EMSOFT '04.

[6]  Nick McKeown,et al.  Algorithms for packet classification , 2001, IEEE Netw..

[7]  Šarūnas Raudys On the effectiveness of Parzen window classifier , 1991 .

[8]  Vyas Sekar,et al.  Data streaming algorithms for estimating entropy of network traffic , 2006, SIGMETRICS '06/Performance '06.

[9]  Charles D. Cranor,et al.  Design and implementation of the uvm virtual memory system , 1998 .

[10]  Yale N. Patt,et al.  Scheduling algorithms for modern disk drives , 1994, SIGMETRICS 1994.

[11]  Susan J. Eggers,et al.  An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture , 2000, ASPLOS.

[12]  J. A. Buzacott,et al.  On the approximations to the single server queue , 1980 .

[13]  Andrew Tomkins,et al.  Informed multi-process prefetching and caching , 1997, SIGMETRICS '97.

[14]  Lyle A. McGeoch,et al.  Adaptive caching for demand prepaging , 2002, ISMM '02.

[15]  M. Neuts,et al.  Introduction to Queueing Theory (2nd ed.). , 1983 .

[16]  Jamal Hadi Salim,et al.  Beyond Softnet , 2001, Annual Linux Showcase & Conference.

[17]  Song Jiang,et al.  LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance , 2002, SIGMETRICS '02.

[18]  George Varghese,et al.  Efficient implementation of a statistics counter architecture , 2003, SIGMETRICS '03.

[19]  Lisa Zhang,et al.  Minimizing end-to-end delay in high-speed networks with a simple coordinated schedule , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[20]  Thomas Lindh,et al.  Systematic Sampling and Cluster Sampling of Packet Delays , 2006 .

[21]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[22]  Philip H. Seaman,et al.  On Teleprocessing System Design Part IV: An Analysis of Auxiliary Storage Activity , 1966, IBM Syst. J..

[23]  Abhishek Kumar,et al.  Data streaming algorithms for efficient and accurate estimation of flow size distribution , 2004, SIGMETRICS '04/Performance '04.

[24]  Nimrod Megiddo,et al.  ARC: A Self-Tuning, Low Overhead Replacement Cache , 2003, FAST.

[25]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[26]  S. Shreve,et al.  Real-time queues in heavy traffic with earliest-deadline-first queue discipline , 2001 .

[27]  Ali Movaghar-Rahimabadi,et al.  A Method for Performance Analysis of Earliest-Deadline-First Scheduling Policy , 2004, DSN.

[28]  Ibrahim Kamel,et al.  Disk Bandwidth Study for Video Servers , 1996 .

[29]  A. L. Narasimha Reddy,et al.  I/O issues in a multimedia system , 1994, Computer.

[30]  Maurice J. Bach The Design of the UNIX Operating System , 1986 .

[31]  Margo I. Seltzer,et al.  Disk Scheduling Revisited , 1990 .

[32]  Carsten Lund,et al.  Properties and prediction of flow statistics from sampled packet streams , 2002, IMW '02.

[33]  Chang Liu,et al.  Disk scheduling policies with lookahead , 2002, PERV.

[34]  Devavrat Shah,et al.  Maintaining Statistics Counters in Router Line Cards , 2002, IEEE Micro.

[35]  Ward Whitt,et al.  Extending the effective bandwidth concept to networks with priority classes , 1998 .

[36]  Jean C. Walrand,et al.  Effective bandwidths for multiclass Markov fluids and other ATM sources , 1993, TNET.

[37]  Dharmendra S. Modha,et al.  SARC: Sequential Prefetching in Adaptive Replacement Cache , 2005, USENIX Annual Technical Conference, General Track.

[38]  Jiangbin Yang,et al.  On-line estimation, network design and performance analysis with effective bandwidths , 2001 .

[39]  Paul Gray,et al.  Performance evaluation of copper-based Gigabit Ethernet interfaces , 2002, 27th Annual IEEE Conference on Local Computer Networks, 2002. Proceedings. LCN 2002..

[40]  Bin Liu,et al.  TCAM-based distributed parallel packet classification algorithm with range-matching solution , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[41]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[42]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2003, SIGCOMM '03.

[43]  K. K. Ramakrishnan,et al.  Eliminating receive livelock in an interrupt-driven kernel , 1996, TOCS.

[44]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[45]  Nikolai Joukov,et al.  Accurate and efficient replaying of file system traces , 2005, FAST'05.

[46]  R. Hughes-Jones Performance Measurements on Gigabit Ethernet NICs and Server Quality Motherboards , 2003 .

[47]  kc claffy,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM 1993.

[48]  Konstantina Papagiannaki,et al.  Bridging router performance and queuing theory , 2004, SIGMETRICS '04/Performance '04.

[49]  Scott Rixner,et al.  Isolating the performance impacts of network interface cards through microbenchmarks , 2004, SIGMETRICS '04/Performance '04.

[50]  EDDIE KOHLER,et al.  The click modular router , 2000, TOCS.

[51]  Carsten Lund,et al.  Flow sampling under hard resource constraints , 2004, SIGMETRICS '04/Performance '04.

[52]  Alexandre Proutière,et al.  On performance bounds for the integration of elastic and adaptive streaming flows , 2004, SIGMETRICS '04/Performance '04.

[53]  Nicolas Hohn,et al.  Inverting sampled traffic , 2003, IEEE/ACM Transactions on Networking.

[54]  Shigeki Goto,et al.  Identifying elephant flows through periodically sampled packets , 2004, IMC '04.

[55]  Mark Handley,et al.  XORP: an open platform for network research , 2003, CCRV.

[56]  Leonard Kleinrock,et al.  Queueing Systems - Vol. 1: Theory , 1975 .