Fraudulent Support Telephone Number Identification Based on Co-Occurrence Information on the Web

"Fraudulent support phones" refers to the misleading telephone numbers placed on Web pages or other media that claim to provide services with which they are not associated. Most fraudulent support phone information is found on search engine result pages (SERPs), and such information substantially degrades the search engine user experience. In this paper, we propose an approach to identify fraudulent support telephone numbers on the Web based on the co-occurrence relations between telephone numbers that appear on SERPs. We start from a small set of seed official support phone numbers and seed fraudulent numbers. Then, we construct a co-occurrence graph according to the co-occurrence relationships of the telephone numbers that appear on Web pages. Additionally, we take the page layout information into consideration on the assumption that telephone numbers that appear in nearby page blocks should be regarded as more closely related. Finally, we develop a propagation algorithm to diffuse the trust scores of seed official support phone numbers and the distrust scores of the seed fraudulent numbers on the co-occurrence graph to detect additional fraudulent numbers. Experimental results based on over 1.5 million SERPs produced by a popular Chinese commercial search engine indicate that our approach outperforms TrustRank, Anti-TrustRank and Good-Bad Rank algorithms by achieving an AUC value of over 0.90.

[1]  András A. Benczúr,et al.  Link-Based Similarity Search to Fight Web Spam , 2006, AIRWeb.

[2]  Xuxian Jiang,et al.  Voice pharming attack and the trust of VoIP , 2008, SecureComm.

[3]  Joon-Hyuk Chang,et al.  Voice phishing detection technique based on minimum classification error method incorporating codec parameters , 2010 .

[4]  Marc Najork,et al.  Spam, damn spam, and statistics: using statistical analysis to locate spam web pages , 2004, WebDB '04.

[5]  Federico Maggi Are the Con Artists Back? A Preliminary Analysis of Modern Phone Frauds , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[6]  Wei-Ying Ma,et al.  Improving pseudo-relevance feedback in web information retrieval using web page segmentation , 2003, WWW '03.

[7]  Yiqun Liu,et al.  Identifying web spam with user behavior analysis , 2008, AIRWeb '08.

[8]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[9]  Luca Becchetti,et al.  Link-Based Characterization and Detection of Web Spam , 2006, AIRWeb.

[10]  Yiqun Liu,et al.  User behavior oriented web spam detection , 2008, WWW.

[11]  Wei-Ying Ma,et al.  Extracting Content Structure for Web Pages Based on Visual Representation , 2003, APWeb.

[12]  Wei-Ying Ma,et al.  Block-level link analysis , 2004, SIGIR '04.

[13]  Stefano Zanero,et al.  A social-engineering-centric data collection initiative to study phishing , 2011, BADGERS '11.

[14]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[15]  Carlos Castillo,et al.  Web spam identification through content and hyperlinks , 2008, AIRWeb '08.

[16]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[17]  Wei-Ying Ma,et al.  Block-based web search , 2004, SIGIR '04.

[18]  Alamgir Hossain,et al.  Awareness Program and AI based Tool to Reduce Risk of Phishing Attacks , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[19]  Hongfei Lin,et al.  Combating Web spam through trust-distrust propagation with confidence , 2013, Pattern Recognit. Lett..

[20]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[21]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[22]  Rashmi Raj,et al.  Web Spam Detection with Anti-Trust Rank , 2006, AIRWeb.