An Efficient Parallelized L7-Filter Design for Multicore Servers

L7-filter is a significant deep packet inspection (DPI) extension to Netfilter in Linux's QoS framework. It classifies network traffic based on information hidden in the packet payload. Although the computationally intensive payload classification can be accelerated with multiple processors, the default OS scheduler is oblivious to both the software characteristics and the underlying multicore architecture. In this paper, we present a parallelized L7-filter algorithm and an efficient scheduler technique for multicore servers. Our multithreaded L7-filter algorithm can process the incoming packets on multiple servers boosting the throughput tremendously. Our scheduling algorithm is based on Highest Random Weight (HRW), which maintains the connection locality for the incoming traffic, but only guarantees load balance at the connection level. We present an Adapted Highest Random Weight (AHRW) algorithm that enhances HRW by applying packet-level load balancing with an additional feedback vector corresponding to the queue length at each processor. We further introduce a Hierarchical AHRW (AHRW-tree) algorithm that considers characteristics of the multicore architecture such as cache and hardware topology by developing a hash tree architecture. The algorithm reduces the scheduling overhead to O(log N) instead of O( N) and produces a better balance between locality and load balancing. Results show that the AHRW-tree scheduler can improve the L7-filter throughput by about 50% on a Sun-Niagara-2-based server compared to a connection locality-based scheduler. Although extensively tested for L7-filter traces, our technique is applicable to many other packet processing applications, where connection locality and load balancing are important while executing on multiple processors. With these speedups and inherent software flexibility, our design and implementation provide a cost-effective alternative to the traffic monitoring and filtering ASICs.

[1]  John W. Lockwood,et al.  Reprogrammable network packet processing on the field programmable port extender (FPX) , 2001, FPGA '01.

[2]  Timothy Sherwood,et al.  A high throughput string matching architecture for intrusion detection and prevention , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[3]  Ioannis Papaefstathiou,et al.  Memory-Efficient 5D Packet Classification At 40 Gbps , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[4]  T. V. Lakshman,et al.  Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection , 2009, IEEE INFOCOM 2009.

[5]  Bin Liu,et al.  A scalable multithreaded L7-filter design for multi-core servers , 2008, ANCS '08.

[6]  Keith W. Ross,et al.  Hash routing for collections of shared Web caches , 1997, IEEE Netw..

[7]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[8]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[9]  Devavrat Shah,et al.  Fair Scheduling through Packet Election , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[10]  Mateo Valero,et al.  MultiLayer processing - an execution model for parallel stateful packet processing , 2008, ANCS '08.

[11]  Laxmi N. Bhuyan,et al.  Compiling PCRE to FPGA for accelerating SNORT IDS , 2007, ANCS '07.

[12]  Thomas Y. C. Woo A modular approach to packet classification: algorithms and results , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[13]  Bryan Veal,et al.  Performance scalability of a multi-core web server , 2007, ANCS '07.

[14]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM.

[15]  David Thaler,et al.  Using name-based mappings to increase hit rates , 1998, TNET.

[16]  Adam Wierman,et al.  On the Impact of Heterogeneity and Back-End Scheduling in Load Balancing Designs , 2009, IEEE INFOCOM 2009.

[17]  Chad R. Meiners,et al.  All-Match Based Complete Redundancy Removal for Packet Classifiers in TCAMs , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[18]  Helen J. Wang,et al.  Generic Application-Level Protocol Analyzer and its Language , 2007, NDSS.

[19]  Ron K. Cytron,et al.  A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[20]  Jonathan S. Turner,et al.  ClassBench: A Packet Classification Benchmark , 2005, IEEE/ACM Transactions on Networking.

[21]  Laxmi N. Bhuyan,et al.  A QoS aware multicore hash scheduler for network applications , 2011, 2011 Proceedings IEEE INFOCOM.

[22]  Laxmi N. Bhuyan,et al.  Load Balancing in a Cluster-Based Web Server for Multimedia Applications , 2006, IEEE Transactions on Parallel and Distributed Systems.

[23]  Raj Jain,et al.  Packet Trains-Measurements and a New Model for Computer Network Traffic , 1986, IEEE J. Sel. Areas Commun..

[24]  Jean-Yves Le Boudec,et al.  Adaptive Load Sharing for Network Processors , 2002, IEEE/ACM Transactions on Networking.

[25]  Prashant J. Shenoy,et al.  Hierarchical Scheduling for Symmetric Multiprocessors , 2008, IEEE Transactions on Parallel and Distributed Systems.

[26]  Bin Liu,et al.  An adaptive hash-based multilayer scheduler for L7-filter on a highly threaded hierarchical multi-core server , 2009, ANCS '09.

[27]  Bin Liu,et al.  NetShield: massive semantics-based vulnerability signature matching for high-speed networks , 2010, SIGCOMM '10.