A MULTI-CHARACTER TRANSITION STRING MATCHING ARCHITECTURE BASED ON AHO-CORASICK ALGORITHM

A hardware string matching architecture is usually used to accelerate string matching in various applications that need to lter content in high speed such as intrusion detection systems. However, the throughput of the hardware string matching architecture inspecting data character by character is limited by the achievable highest clock rate. In this paper, we present a string matching architecture based on the Aho-Corasick algo- rithm. The proposed architecture is able to inspect multiple characters simultaneously and the throughput of string matching can be multiplied. We rst describe an intuitive algorithm to construct a multi-character nite state machine (FSM) that accepts mul- tiple characters per transition based on an Aho-Corasick prex tree (AC-trie). Then we propose an architecture for multi-character transition string matching consisting of multiple matching units for processing the transition rules that are generated from the derived multi-character FSM. The design of proposed architecture utilizes the properties of the failure links of an AC-trie to reduce the transition rules derived from the failure functions linked to the initial state. As a result, the state growth rate is moderate in the number of the derived multi-character transition rules as the number of the characters inspected at a time increases. The proposed architecture was implemented on an ASIC device for evaluation and the resulting throughput can achieve 4.5 Gbps for a 4-character string matching implementation operated at 142 MHz clock.