A Data-Deduplication-Based Matching Mechanism for URL Filtering

URL filtering plays an important role in various network security applications. URL filtering usually requires high matching performance, but the performance of the classical multiple string matching algorithms have been difficult to be significantly improved. In this article, we found that the online URLs to be filtered contain a large number of duplicate URLs. According to this observation, we propose a novel deduplication-based matching mechanism (DBM) for URL filtering. The DBM caches information of the duplicate URLs in a hash table to avoid duplicate URLs being repeatedly scanned by URL filtering system. The DBM can be used in conjunction with any multiple string matching algorithms. Experimental results show that when a multiple string matching algorithm used in conjunction with the DBM, the matching speed of the URL filtering system can be increased by 9\%-68\%. So DBM can significantly accelerate the speed of URL filtering system. Besides increasing speed of URL filtering system, DBM is a mechanism independent of the specific matching algorithm and can be easily used in other field.