Identifying Gambling and Porn Websites with Image Recognition

Gambling and porn websites are more and more harmful to the health and growth of the youth with the rapid development of the Internet, however, the text contents and URLs based website classification methods could not get satisfying on gambling and porn websites detection because domain names of them change fast. Meanwhile, the visual based website classification has gotten perfect results in phishing website detection which encourages us. Therefore, we introduce the visual feature to identify gambling websites and porn websites in this paper. Firstly, we develop a website screenshot tool which could save the full contents of a website to be a image, Secondly, the effective feature is chosen by BoW model to recognize the screenshots of gambling websites and porn websites, and the appropriate parameters are chosen to promote the efficiency of classification. Finally, experimental results on our collected gambling websites and porn website datasets demonstrate that our proposed method is able to recognize the gambling and porn websites and gets satisfying results.

[1]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[2]  Low Tang Jung,et al.  Malicious Web Page Detection: A Machine Learning Approach , 2014 .

[3]  Chandrabose Aravindan,et al.  Web page classification using n-gram based URL features , 2013, 2013 Fifth International Conference on Advanced Computing (ICoAC).

[4]  Zigang Cao,et al.  A Survey on Encrypted Traffic Classification , 2014 .

[5]  Neeraj Kumar,et al.  An efficient scheme for automatic web pages categorization using the support vector machine , 2016, New Rev. Hypermedia Multim..

[6]  Li Guo,et al.  An adult image detection algorithm based on Bag-of-Visual-Words and text information , 2014, 2014 10th International Conference on Natural Computation (ICNC).

[7]  Yu Zhou,et al.  Visual Similarity Based Anti-phishing with the Combination of Local and Global Features , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[8]  Rachel Greenstadt,et al.  PhishZoo: Detecting Phishing Websites by Looking at Them , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[9]  Jie Chen,et al.  Improved FAST Corner Detection Based on Harris Algorithm for Chinese Characters , 2013 .

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Kurt Konolige,et al.  CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching , 2008, ECCV.

[12]  Max-Emanuel Maurer,et al.  Sophisticated Phishers Make More Spelling Mistakes: Using URL Similarity against Phishing , 2012, CSS.

[13]  Syed Taqi Ali,et al.  A Computer Vision Technique to Detect Phishing Attacks , 2015, 2015 Fifth International Conference on Communication Systems and Network Technologies.

[14]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[15]  Jayant Gadge,et al.  Hybrid dimensionality reduction approach for web page classification , 2015, 2015 International Conference on Communication, Information & Computing Technology (ICCICT).

[16]  Xv Lan,et al.  LWCS: A large-scale web page classification system based on anchor graph hashing , 2015, 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS).

[17]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[18]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19]  Ebru Akcapinar Sezer,et al.  Use of HOG descriptors in phishing detection , 2016, 2016 4th International Symposium on Digital Forensic and Security (ISDFS).