Design of multi-view based email classification for IoT systems via semi-supervised learning

Abstract Suspicious emails are one big threat for Internet of Things (IoT) security, which aim to induce users to click and then redirect them to a phishing webpage. To protect IoT systems, email classification is an essential mechanism to classify spam and legitimate emails. In the literature, most email classification approaches adopt supervised learning algorithms that require a large number of labeled data for classifier training. However, data labeling is very time consuming and expensive, making only a very small set of data available in practice, which would greatly degrade the effectiveness of email classification. To mitigate this problem, in this work, we develop an email classification approach based on multi-view disagreement-based semi-supervised learning. The idea behind is that multi-view method can offer richer information for classification, which is often ignored by the literature. The use of semi-supervised learning can help leverage both labeled and unlabeled data. In the evaluation, we investigate the performance of our proposed approach with two datasets and in a real network environment. Experimental results demonstrate that the use of multi-view data can achieve more accurate email classification than the use of single-view data, and that our approach is more effective as compared to several existing similar algorithms.

[1]  Horace Ho-Shing Ip,et al.  Enhancing collaborative intrusion detection networks against insider attacks using supervised intrusion sensitivity-based trust management model , 2017, J. Netw. Comput. Appl..

[2]  Baowen Xu,et al.  Harmonic functions based semi-supervised learning for web spam detection , 2011, SAC '11.

[3]  Gordon V. Cormack,et al.  Semi-supervised spam filtering using aggressive consistency learning , 2010, SIGIR '10.

[4]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[5]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[6]  Duncan S. Wong,et al.  Design of touch dynamics based user authentication with an adaptive mechanism on mobile phones , 2014, SAC.

[7]  Mark Allman,et al.  A large-scale empirical analysis of email spam detection through network characteristics in a stand-alone enterprise , 2014, Comput. Networks.

[8]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[9]  Jie Wu,et al.  Robust Network Traffic Classification , 2015, IEEE/ACM Transactions on Networking.

[10]  Steven Furnell,et al.  Surveying the Development of Biometric User Authentication on Mobile Phones , 2015, IEEE Communications Surveys & Tutorials.

[11]  Yiyu Yao,et al.  Cost-sensitive three-way email spam filtering , 2013, Journal of Intelligent Information Systems.

[12]  Zhi-Hua Zhou,et al.  On multi-view active learning and the combination with semi-supervised learning , 2008, ICML '08.

[13]  Blaine Nelson,et al.  Analyzing Behavioral Features for Email Classification , 2005, CEAS.

[14]  Dewei Li,et al.  Multi-view learning based on nonparallel support vector machine , 2018, Knowl. Based Syst..

[15]  Victor Cheng,et al.  Personalized Spam Filtering with Semi-supervised Classifier Ensemble , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[16]  Li Zhitang,et al.  A Method for Spam Behavior Recognition Based on Fuzzy Decision Tree , 2009, CIT.

[17]  Li Yang,et al.  A trust-based collaborative filtering algorithm for E-commerce recommendation system , 2018, Journal of Ambient Intelligence and Humanized Computing.

[18]  Wenjuan Li,et al.  EFM: Enhancing the performance of signature-based network intrusion detection systems using enhanced filter mechanism , 2014, Comput. Secur..

[19]  Weizhi Meng,et al.  Intrusion Detection in the Era of IoT: Building Trust via Traffic Filtering and Sampling , 2018, Computer.

[20]  Jun Zhang,et al.  Detecting and Preventing Cyber Insider Threats: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[21]  Victor Cheng,et al.  Combining Supervised and Semi-supervised Classifier for Personalized Spam Filtering , 2007, PAKDD.

[22]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Wenjuan Li,et al.  Enhancing email classification using data reduction and disagreement-based semi-supervised learning , 2014, 2014 IEEE International Conference on Communications (ICC).

[24]  David Mandell Freeman,et al.  Using naive bayes to detect spammy names in social networks , 2013, AISec.

[25]  Hamideh Afsarmanesh,et al.  Disagreement-Based Co-training , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[26]  D. Sculley,et al.  Relaxed online SVMs for spam filtering , 2007, SIGIR.

[27]  Jun Zhang,et al.  Modeling and Analysis on the Propagation Dynamics of Modern Email Malware , 2014, IEEE Transactions on Dependable and Secure Computing.

[28]  Wanlei Zhou,et al.  Modeling and Analysis for Thwarting Worm Propagation in Email Networks , 2013, NSS.

[29]  Tom M. Mitchell,et al.  Semi-Supervised Text Classification Using EM , 2006, Semi-Supervised Learning.

[30]  Jian Shen,et al.  A Novel Security Scheme Based on Instant Encrypted Transmission for Internet of Things , 2018, Secur. Commun. Networks.

[31]  Ming Yang,et al.  Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach , 2009, ADMA.

[32]  B. John Oommen,et al.  Anomaly Detection in Dynamic Systems Using Weak Estimators , 2011, TOIT.

[33]  Maozhen Li,et al.  A survey of emerging approaches to spam filtering , 2012, CSUR.

[34]  Robert H. Deng,et al.  Security and Privacy in Smart Health: Efficient Policy-Hiding Attribute-Based Access Control , 2018, IEEE Internet of Things Journal.

[35]  Nizar Bouguila,et al.  A study of spam filtering using support vector machines , 2010, Artificial Intelligence Review.

[36]  Saharon Rosset,et al.  Model selection via the AUC , 2004, ICML.

[37]  Zhi-Hua Zhou,et al.  Multi-Label Learning by Instance Differentiation , 2007, AAAI.

[38]  Dongqing Xie,et al.  Social influence modeling using information theory in mobile social networks , 2017, Inf. Sci..

[39]  Rodica Potolea,et al.  Spam detection filter using KNN algorithm and resampling , 2010, Proceedings of the 2010 IEEE 6th International Conference on Intelligent Computer Communication and Processing.

[40]  Yu Wang,et al.  TouchWB: Touch behavioral user authentication based on web browsing on smartphones , 2018, J. Netw. Comput. Appl..

[41]  Duncan S. Wong,et al.  Enhancing touch behavioral authentication via cost-based intelligent mechanism on smartphones , 2018, Multimedia Tools and Applications.

[42]  Duncan S. Wong,et al.  Touch Gestures Based Biometric Authentication Scheme for Touchscreen Mobile Phones , 2012, Inscrypt.

[43]  Wanlei Zhou,et al.  An Analytical Model on the Propagation of Modern Email Worms , 2012, 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications.

[44]  Witawas Srisa-an,et al.  Significant Permission Identification for Machine-Learning-Based Android Malware Detection , 2018, IEEE Transactions on Industrial Informatics.

[45]  Stan Matwin,et al.  Email Classification with Temporal Features , 2004, Intelligent Information Systems.

[46]  El-Sayed M. El-Alfy,et al.  Using GMDH-based networks for improved spam detection and email feature analysis , 2011, Appl. Soft Comput..

[47]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[48]  Saurabh Bagchi,et al.  Spam detection in voice-over-IP calls through semi-supervised clustering , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[49]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[50]  Yao Liu,et al.  Using contextual features and multi-view ensemble learning in product defect identification from online discussion forums , 2018, Decis. Support Syst..

[51]  Chunhua Su,et al.  Enhancing Trust Management for Wireless Intrusion Detection via Traffic Sampling in the Era of Big Data , 2018, IEEE Access.

[52]  Hujun Yin,et al.  Multi-view dimensionality reduction based on Universum learning , 2018, Neurocomputing.

[53]  Tao Peng,et al.  Collaborative trajectory privacy preserving scheme in location-based services , 2017, Inf. Sci..

[54]  Fayez Gebali,et al.  Binary LNS-based naive Bayes hardware classifier for spam control , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[55]  Zhiyuan Tan,et al.  Towards Designing an Email Classification System Using Multi-view Based Semi-supervised Learning , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[56]  Jae-Young Sim,et al.  Stitching for Multi-View Videos With Large Parallax Based on Adaptive Pixel Warping , 2018, IEEE Access.

[57]  Miguel Rio,et al.  Symbiotic filtering for spam email detection , 2011, Expert Syst. Appl..

[58]  Qing Wang,et al.  Distance metric optimization driven convolutional neural network for age invariant face recognition , 2018, Pattern Recognit..

[59]  Lam-For Kwok,et al.  Enhancing the performance of signature-based network intrusion detection systems: an engineering approach , 2014 .

[60]  Charles L. A. Clarke,et al.  Clustering for semi-supervised spam filtering , 2011, CEAS '11.

[61]  Jian Shen,et al.  Finger vein secure biometric template generation based on deep learning , 2018, Soft Comput..

[62]  Calton Pu,et al.  A study on evolution of email spam over fifteen years , 2013, CollaborateCom 2013.

[63]  Shrawan Kumar Trivedi,et al.  Effect of feature selection methods on machine learning classifiers for detecting email spams , 2013, RACS.

[64]  Tsuhan Chen,et al.  Semi-supervised co-training and active learning based approach for multi-view intrusion detection , 2009, SAC '09.

[65]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[66]  Jian Pei,et al.  Email mining: tasks, common techniques, and tools , 2013, Knowledge and Information Systems.

[67]  Yang Xiang,et al.  Email classification using data reduction method , 2010, 2010 5th International ICST Conference on Communications and Networking in China.