Uncovering the Landscape of Fraud and Spam in the Telephony Channel

Robocalling, voice phishing, and caller ID spoofing are common cybercrime techniques used to launch scam campaigns through the telephony channel, which unsuspecting users have long trusted. More reliable than online complaints, a telephony honeypot provides complete, accurate and timely information about unwanted phone calls across the United States. Our first goal is to provide a large-scale data-driven analysis of the telephony spam and fraud ecosystem. Our second goal is to uniquely identify bad actors potentially operating several phone numbers. We collected about 40,000 unsolicited calls. Our results show that only a few bad actors, robocallers or telemarketers, are responsible for the majority of the spam and scam calls, and that they can be uniquely identified based on audio features from their calls. This discovery has major implications for law enforcement and businesses that are presently engaged in combatting the rise of telephony fraud. In particular, since our system allows endusers to detect fraudulent behavior and tie it back to existing fraud and spam campaigns, it can be used as the first step towards designing and deploying intelligent defense strategies.

[1]  C. Elkan,et al.  Topic Models , 2008 .

[2]  Mustaque Ahamad,et al.  Phoneypot: Data-driven Understanding of Telephony Threats , 2015, NDSS.

[3]  Peter Wiemer-Hastings,et al.  Latent semantic analysis , 2004, Annu. Rev. Inf. Sci. Technol..

[4]  M. Weatherford,et al.  Mining for fraud , 2002 .

[5]  Che-Wei Huang,et al.  FrauDetector: A Graph-Mining-based Framework for Fraudulent Phone Call Detection , 2015, KDD.

[6]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[7]  Marti A. Hearst,et al.  Why phishing works , 2006, CHI.

[8]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[9]  Vern Paxson,et al.  Trafficking Fraudulent Accounts: The Role of the Underground Market in Twitter Spam and Abuse , 2013, USENIX Security Symposium.

[10]  Scott P. Robertson,et al.  Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , 1991 .

[11]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[12]  Allan R. Wilks,et al.  Fraud Detection in Telecommunications: History and Lessons Learned , 2010, Technometrics.

[13]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[14]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[15]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[16]  S. Dumais Latent Semantic Analysis. , 2005 .

[17]  Patrick Traynor,et al.  PinDr0p: using single-ended audio features to determine call provenance , 2010, CCS '10.

[18]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[19]  Martijn Onderwater,et al.  Detecting unusual user proles with outlier detection techniques , 2010 .