A High-Performance URL Lookup Engine for URL Filtering Systems

URL filtering systems provide a simple and effective way to prevent people from browsing undesirable or malicious websites. These systems require a well-designed URL lookup method as the core operation. A high-performance URL lookup engine is proposed in this paper for URL filtering systems. It combines a URL compression algorithm with a multiple string matching based (Wu-Manber-like) matching algorithm. Using this method, the proposed URL lookup engine can achieve high URL lookup performance and efficient memory utilization for storing the ever-increasing URL blacklist with the ability of prefix matching. Experiments with actual URL blacklists and requested URL sets show that our engine can save about 80% memory usage for storing URL blacklists, and reduce 58%-162% URL lookup time compared with the state-of-the-art URL lookup methods.

[1]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[2]  Nen-Fu Huang,et al.  A fast URL lookup engine for content-aware multi-gigabit switches , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[3]  Udi Manber,et al.  A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .

[4]  Sun-Young Hwang,et al.  Fast URL Lookup Using URL Prefix Hash Tree , 2008 .

[5]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[6]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[7]  Zornitza Genova Prodanoff,et al.  Managing routing tables for URL routers in content distribution networks , 2004 .

[8]  Roy T. Fielding,et al.  Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.

[9]  Kenneth J. Christensen,et al.  Managing routing tables for URL routers in content distribution networks , 2004, Int. J. Netw. Manag..

[10]  Lixia Zhang,et al.  URL forwarding and compression in adaptive Web caching , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[11]  Li Xiao-Ming,et al.  Two Effective Functions on Hashing URL , 2004 .