An Algorithm of Large-Scale Approximate Multiple String Matching for Network Security

Payload checking has become the basic technique for network security applications, where the exact string matching technology is widely used. But as the game between attackers and defenders goes further into payload confusion, the approximate string matching technology is needed, especially large-scale approximate multiple string matching technology. In this paper, we propose one practical algorithm, LargePEX, for large scale approximate multiple string matching based on edit distance. The algorithm is basically extended from PEX, an algorithm of approximate single string matching, with the idea of filtering and verification. LargePEX is finely designed to fit for large-scale matching using fine grain steps analyses. Some experiments are presented to verify the efficiency of LargePEX. As the results show, for the set of 10k strings, the average network payload checking speed using this algorithm can achieve 25 MBps-40 MBps, enough for 100 Mbps Ethernet. With hardware upgrading, the algorithm is also practical for Gigabit Ethernet. So LargePEX provides a new way for defenders to develop more effective methods to protect valuable resources and prevent intrusions by payload checking.