Communication Behaviour-Based Big Data Application to Classify and Detect HTTP Automated Software

HTTP is recognized as the most widely used protocol on the Internet when applications are being transferred more and more by developers onto the web. Due to increasingly complex computer systems, diversity HTTP automated software autoware thrives. Unfortunately, besides normal autoware, HTTP malware and greyware are also spreading rapidly in web environment. Consequently, network communication is not just rigorously controlled by users intention. This raises the demand for analyzing HTTP autoware communication behaviour to detect and classify malicious and normal activities via HTTP traffic. Hence, in this paper, based on many studies and analysis of the autoware communication behaviour through access graph, a new method to detect and classify HTTP autoware communication at network level is presented. The proposal system includes combination of MapReduce of Hadoop and MarkLogic NoSQL database along with xQuery to deal with huge HTTP traffic generated each day in a large network. The method is examined with real outbound HTTP traffic data collected through a proxy server of a private network. Experimental results obtained for proposed method showed that promised outcomes are achieved since 95.1% of suspicious autoware are classified and detected. This finding may assist network and system administrator in inspecting early the internal threats caused by HTTP autoware.

[1]  José M. F. Moura,et al.  An efficient method to detect periodic behavior in botnet traffic by analyzing control plane traffic , 2013, Journal of advanced research.

[2]  Nezer Zaidenberg,et al.  An efficient VM-based software protection , 2011, 2011 5th International Conference on Network and System Security.

[3]  Niels Provos,et al.  CAMP: Content-Agnostic Malware Protection , 2013, NDSS.

[4]  Yi-Shin Chen,et al.  Detect phishing by checking content consistency , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).

[5]  John Heidemann,et al.  Low-rate, flow-level periodicity detection , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[6]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[7]  N. M. Tahir,et al.  An efficient false alarm reduction approach in HTTP-based botnet detection , 2013, 2013 IEEE Symposium on Computers & Informatics (ISCI).

[8]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[9]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[10]  Scott Dick,et al.  Detecting visually similar Web pages: Application to phishing detection , 2010, TOIT.

[11]  Farnam Jahanian,et al.  CloudAV: N-Version Antivirus in the Network Cloud , 2008, USENIX Security Symposium.

[12]  Thamar Solorio,et al.  Lexical feature based phishing URL detection using online learning , 2010, AISec '10.

[13]  Guofei Gu,et al.  EFFORT: A new host-network cooperated framework for efficient and effective bot malware detection , 2013, Comput. Networks.

[14]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Ali A. Ghorbani,et al.  Automatic discovery of botnet communities on large-scale communication networks , 2009, ASIACCS '09.