LEAP: Latency- energy- and area-optimized lookup pipeline

Table lookups and other types of packet processing require so much memory bandwidth that the networking industry has long been a major consumer of specialized memories like TCAMs. Extensive research in algorithms for longest prefix matching and packet classification has laid the foundation for lookup engines relying on area- and power-efficient random access memories. Motivated by costs and semiconductor technology trends, designs from industry and academia implement multi-algorithm lookup pipelines by synthesizing multiple functions into hardware, or by adding programmability. In existing proposals, programmability comes with significant overhead. We build on recent innovations in computer architecture that demonstrate the efficiency and flexibility of dynamically synthesized accelerators. In this paper we propose LEAP, a latency-energy- and area- optimized lookup pipeline based on an analysis of various lookup algorithms. We compare to PLUG, which relies on von-Neumann-style programmable processing. We show that LEAP has equivalent flexibility by porting all lookup algorithms previously shown to work with PLUG. At the same time, LEAP reduces chip area by 1.5×, power consumption by 1.3×, and latency typically by 5×. Furthermore, programming LEAP is straight-forward; we demonstrate an intuitive Python-based API.

[1]  Andrei Z. Broder,et al.  Using multiple hash functions to improve IP lookups , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[2]  Magnus Själander,et al.  FlexCore: Utilizing Exposed Datapath Control for Efficient Computing , 2007, 2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[3]  Mark Horowitz,et al.  Rethinking Digital Design: Why Design Must Change , 2010, IEEE Micro.

[4]  George Varghese,et al.  Beyond bloom filters: from approximate membership checks to approximate state machines , 2006, SIGCOMM.

[5]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[6]  Steven Swanson,et al.  QSCORES: Trading dark silicon for scalable energy efficiency with quasi-specific cores , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  George Varghese,et al.  Scalable packet classification , 2001, SIGCOMM '01.

[8]  Martín Casado,et al.  Ethane: taking control of the enterprise , 2007, SIGCOMM '07.

[9]  Jonathan S. Turner,et al.  ClassBench: A Packet Classification Benchmark , 2005, IEEE/ACM Transactions on Networking.

[10]  Sarang Dharmapurikar,et al.  Longest prefix matching using bloom filters , 2006, IEEE/ACM Transactions on Networking.

[11]  Jennifer Rexford,et al.  Floodless in seattle: a scalable ethernet architecture for large enterprises , 2008, SIGCOMM '08.

[12]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM.

[13]  Patrick Crowley,et al.  CAMP: fast and efficient IP lookup architecture , 2006, ANCS '06.

[14]  Katerina J. Argyraki,et al.  RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.

[15]  John W. Lockwood,et al.  Scalable IP lookup for Internet routers , 2003, IEEE J. Sel. Areas Commun..

[16]  Somesh Jha,et al.  Design and implementation of the PLUG architecture for programmable and efficient network lookups , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  Girija J. Narlikar,et al.  Fast incremental updates for pipelined forwarding engines , 2005, IEEE/ACM Transactions on Networking.

[18]  Karthikeyan Sankaralingam,et al.  Experiences in Co-designing a Packet Classification Algorithm and a Flexible Hardware Platform , 2011, 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems.

[19]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[21]  George Varghese,et al.  A pipelined memory architecture for high throughput network processors , 2003, ISCA '03.

[22]  William J. Dally,et al.  A compile-time managed multi-level register file hierarchy , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Grigore Rosu,et al.  A tree based router search engine architecture with single port memories , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[24]  George Varghese,et al.  Packet classification using multidimensional cutting , 2003, SIGCOMM '03.

[25]  Amin Ansari,et al.  Bundled execution of recurring traces for energy-efficient general purpose processing , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Kai Zheng,et al.  V6Gene: a scalable IPv6 prefix generator for route lookup algorithm benchmark , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[27]  Abdallah Tubaishat,et al.  New hardware architecture for bit-counting , 2006 .

[28]  William J. Dally,et al.  Energy-efficient mechanisms for managing thread context in throughput processors , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[29]  Nick McKeown,et al.  Routing lookups in hardware at memory access speeds , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[30]  Viktor K. Prasanna,et al.  Scalable Tree-Based Architectures for IPv4/v6 Lookup Using Prefix Partitioning , 2012, IEEE Transactions on Computers.

[31]  References , 1971 .

[32]  Svante Carlsson,et al.  Small forwarding tables for fast routing lookups , 1997, SIGCOMM '97.

[33]  Yi Pan,et al.  PLUG: flexible lookup modules for rapid deployment of new protocols in high-speed routers , 2009, SIGCOMM '09.

[34]  Berthold Vöcking,et al.  How asymmetry helps load balancing , 1999, JACM.

[35]  Karthikeyan Sankaralingam,et al.  Dynamically Specialized Datapaths for energy efficient computing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[36]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[37]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[38]  T. N. Vijaykumar,et al.  EffiCuts: optimizing packet classification for memory and throughput , 2010, SIGCOMM '10.