论文信息 - A framework for fast and fair evaluation of automata processing hardware

A framework for fast and fair evaluation of automata processing hardware

Programming Micron's Automata Processor (AP) requires expertise in both automata theory and the AP architecture, as programmers have to manually manipulate state transition elements (STEs) and their transitions with a low-level Automata Network Markup Language (ANML). When the required STEs of an application exceed the hardware capacity, multiple reconfigurations are needed. However, most previous AP-based designs limit the dataset size to fit into a single AP board and simply neglect the costly overhead of reconfiguration. This results in unfair performance comparisons between the AP and other processors. To address this issue, we propose a framework for the fast and fair evaluation of AP devices. Our framework provides a hierarchical approach that automatically generates automata for large datasets through user-defined paradigms and allows the use of cascadable macros to achieve highly optimized reconfigurations. We highlight the importance of counting the configuration time in the overall AP performance, which in turn, can provide better insight into identifying essential hardware features, specifically for large-scale problem sizes. Our framework shows that the AP can achieve up to 461x overall speedup fairly compared to CPU counterparts.

[1] Xiaodong Yu,et al. Exploring different automata representations for efficient regular expression matching on GPUs , 2013, PPoPP '13.

[2] Kevin Skadron,et al. RAPID Programming of Pattern-Recognition Processors , 2016, International Conference on Architectural Support for Programming Languages and Operating Systems.

[3] Xiaodong Yu,et al. GPU acceleration of regular expression matching for large datasets: exploring the implementation space , 2013, CF '13.

[4] Kevin Skadron,et al. Nondeterministic Finite Automata in Hardware-the Case of the Levenshtein Automaton , 2015 .

[5] Xiaodong Yu,et al. Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence , 2014, IEEE Journal on Selected Areas in Communications.

[6] Dave Brown,et al. Supplementary Material for An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing , 2013 .

[7] Wu-chun Feng,et al. O3FA: A scalable finite automata-based pattern-matching engine for out-of-order deep packet inspection , 2016, 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[8] Wu-chun Feng,et al. Demystifying automata processing: GPUs, FPGAs or Micron's AP? , 2017, ICS.