SPIT callers detection with unsupervised Random Forests classifier

As VoIP (Voice over IP) grows rapidly, it is expected to prevail tremendous unsolicited advertisement calls, which type of calls is referred to SPIT (SPam over Internet Telephony). SPIT detection is more difficult to execute than email SPAM detection since the callee or SPIT detection system does not tell whether it is SPIT or legitimate call until he/she actually takes a call. Recently, many SPIT detection techniques are proposed by finding outliers of call patterns. However, most of these techniques suffer from setting a threshold to distinguish that the caller is legitimate or not and this could cause to high false negative rate or low true positive rate. This is because these techniques analyse call pattern by a single feature e.g. call frequency or average call duration. In this paper, we propose a multi-feature call pattern analysis with unsupervised Random Forests classifier, which is one of the excellent classification algorithms. We also propose two simple but helpful features for better classification. We show the effectiveness of Random Forests based classification without supervised training data and which features contribute to classification.

[1]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[2]  P. Rousseeuw,et al.  Partitioning Around Medoids (Program PAM) , 2008 .

[3]  Neco Ventura,et al.  A Multilayered Architecture for Preventing Automated Spam in the IP Multimedia Subsystem , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[4]  Dongwook Shin,et al.  Progressive multi gray-leveling: a voice spam protection algorithm , 2006, IEEE Network.

[5]  Neco Ventura,et al.  Blocking Unsolicited Voice Calls Using Decoys for the IMS , 2007, IEEE International Conference on Communications.

[6]  Christoph Sorge,et al.  A Provider-Level Reputation System for Assessing the Quality of SPIT Mitigation Algorithms , 2009, 2009 IEEE International Conference on Communications.

[7]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[8]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[9]  Antonio Nucci,et al.  You can SPIT, but you can't hide: Spammer identification in telephony networks , 2011, 2011 Proceedings IEEE INFOCOM.

[10]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.