Multi-pattern matching algorithm based on MapReduce and Hadoop

Large data sets present new challenges when virus scanning; parallel scanning technology could be an effective remedy for this problem. This research is based on MapReduce and Hadoop platforms and aims to improve the efficiency of virus scanning by making the multi-pattern matching Aho-Corsick (AC) algorithm parallel. Experiments show that, for large data sets, parallel scanning is more efficient than traditional stand-alone scanning.

[1]  Jie Lu,et al.  Scaling-Up Item-Based Collaborative Filtering Recommendation Algorithm Based on Hadoop , 2011, 2011 IEEE World Congress on Services.

[2]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[3]  GhemawatSanjay,et al.  The Google file system , 2003 .

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Huang Lu,et al.  Research on Hadoop Cloud Computing Model and its Applications , 2012, 2012 Third International Conference on Networking and Distributed Computing.

[6]  Aditya B. Patel,et al.  Addressing big data problem using Hadoop and Map Reduce , 2012, 2012 Nirma University International Conference on Engineering (NUiCONE).

[7]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[8]  Jun Li,et al.  Efficient Join Query Processing Algorithm CHMJ Based on Hadoop: Efficient Join Query Processing Algorithm CHMJ Based on Hadoop , 2012 .