A Survey of Machine Learning Algorithms and Their Application in Information Security

In this survey, we touch on the breadth of applications of machine learning to problems in information security. A wide variety of machine learning techniques are introduced, and a sample of the applications of each to security-related problems is briefly discussed.

[1]  Simon Brown,et al.  Detecting Phishing Emails Using Hybrid Features , 2009, 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing.

[2]  Mark Stamp,et al.  Hidden Markov Models for Software Piracy Detection , 2013, Inf. Secur. J. A Glob. Perspect..

[3]  Thomas Serre,et al.  Hierarchical classification and feature reduction for fast face detection with support vector machines , 2003, Pattern Recognit..

[4]  Sung-Bae Cho,et al.  Efficient anomaly detection by modeling privilege flows using hidden Markov model , 2003, Comput. Secur..

[5]  Phalguni Gupta,et al.  Face Classification Using Gabor Wavelets and Random Forest , 2009, 2009 Canadian Conference on Computer and Robot Vision.

[6]  P. Vinod,et al.  Droid permission miner: Mining prominent permissions for Android malware analysis , 2014, The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014).

[7]  Wen Gao,et al.  Face recognition using Ada-Boosted Gabor features , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[8]  Mark Stamp,et al.  Image spam analysis and detection , 2018, Journal of Computer Virology and Hacking Techniques.

[9]  Ludovic Mé,et al.  Code obfuscation techniques for metamorphic viruses , 2008, Journal in Computer Virology.

[10]  Jiankun Hu,et al.  A k-Nearest Neighbor Approach for User Authentication through Biometric Keystroke Dynamics , 2008, 2008 IEEE International Conference on Communications.

[11]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[12]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[13]  Angelos Stavrou,et al.  Malicious PDF detection using metadata and structural features , 2012, ACSAC '12.

[14]  Yuval Elovici,et al.  Automated Static Code Analysis for Classifying Android Applications Using Machine Learning , 2010, 2010 International Conference on Computational Intelligence and Security.

[15]  Chin-Chen Chang,et al.  A reversible data hiding scheme with modified side match vector quantization , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[16]  Takeshi Okamoto,et al.  Towards an immunity-based anomaly detection system for network traffic , 2011, Int. J. Knowl. Based Intell. Eng. Syst..

[17]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[18]  C. A. Kumar,et al.  An analysis of supervised tree based classifiers for intrusion detection system , 2013, 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering.

[19]  Roberto Perdisci,et al.  VAMO: towards a fully automated malware clustering validity analysis , 2012, ACSAC '12.

[20]  Alva Erwin,et al.  Analysis of Machine learning Techniques Used in Behavior-Based Malware Detection , 2010, 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[21]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[22]  Abhinav Srivastava,et al.  Credit Card Fraud Detection Using Hidden Markov Model , 2008, IEEE Transactions on Dependable and Secure Computing.

[23]  Andrew H. Sung,et al.  Intrusion detection using neural networks and support vector machines , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[24]  Chengjun Liu,et al.  Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition , 2002, IEEE Trans. Image Process..

[25]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[26]  Mark Stamp,et al.  Hunting for metamorphic engines , 2006, Journal in Computer Virology.

[27]  Al-Sakib Khan Pathan,et al.  The State of the Art in Intrusion Prevention and Detection , 2014 .

[28]  Takeshi Okamoto,et al.  Framework of an Immunity-Based Anomaly Detection System for User Behavior , 2007, KES.

[29]  Viet Hung Nguyen,et al.  Predicting vulnerable software components with dependency graphs , 2010, MetriSec '10.

[30]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[31]  Wei Hu,et al.  AdaBoost-Based Algorithm for Network Intrusion Detection , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Eric Filiol,et al.  Dueling hidden Markov models for virus analysis , 2015, Journal of Computer Virology and Hacking Techniques.

[33]  Bhavani M. Thuraisingham,et al.  A new intrusion detection system using support vector machines and hierarchical clustering , 2007, The VLDB Journal.

[34]  Mark Stamp,et al.  Eigenvalue analysis for metamorphic detection , 2014, Journal of Computer Virology and Hacking Techniques.

[35]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[36]  Shiguang Shan,et al.  Side-Information based Linear Discriminant Analysis for Face Recognition , 2011, BMVC.

[37]  Mark Stamp,et al.  Profile hidden Markov models and metamorphic virus detection , 2009, Journal in Computer Virology.

[38]  Yuchun Tang,et al.  Identifying Image Spam based on Header and File Properties using C4.5 Decision Trees and Support Vector Machine Learning , 2007, 2007 IEEE SMC Information Assurance and Security Workshop.

[39]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[40]  S. Sprager,et al.  A cumulant-based method for gait identification using accelerometer data with principal component analysis and support vector machine , 2009 .

[41]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[42]  Juan Arturo Nolazco-Flores,et al.  Hybrid Method for Detecting Masqueraders Using Session Folding and Hidden Markov Models , 2006, MICAI.

[43]  Mark Stamp,et al.  Exploring Hidden Markov Models for Virus Analysis: A Semantic Approach , 2013, 2013 46th Hawaii International Conference on System Sciences.

[44]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[45]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[46]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[47]  Mark Stamp,et al.  Masquerade detection using profile hidden Markov models , 2011, Comput. Secur..

[48]  Chin-Chen Chang,et al.  A virtual image cryptosystem based upon vector quantization , 1998, IEEE Trans. Image Process..

[49]  Kanchi Gopinath,et al.  Discovery of Application Workloads from Network File Traces , 2010, FAST.

[50]  Mark Stamp,et al.  Hunting for Pirated Software Using Metamorphic Analysis , 2014, Inf. Secur. J. A Glob. Perspect..

[51]  Mark Stamp,et al.  Clustering for malware classification , 2017, Journal of Computer Virology and Hacking Techniques.

[52]  Boris G. Mirkin,et al.  Choosing the number of clusters , 2011, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[53]  V. Rao Vemuri,et al.  Use of K-Nearest Neighbor classifier for intrusion detection , 2002, Comput. Secur..

[54]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[55]  N. Balakrishnan,et al.  Behavior-based Malware analysis using profile hidden Markov models , 2013, 2013 International Conference on Security and Cryptography (SECRYPT).

[56]  Mohammad Zulkernine,et al.  A hybrid network intrusion detection technique using random forests , 2006, First International Conference on Availability, Reliability and Security (ARES'06).

[57]  P. I. Fierens,et al.  A Survey on Masquerader Detection Approaches , 2009 .

[58]  Masatsugu Ichino,et al.  Evaluating Header Information Features for Malware Infection Detection , 2015, J. Inf. Process..

[59]  Siwei Lyu,et al.  Steganalysis using color wavelet statistics and one-class support vector machines , 2004, IS&T/SPIE Electronic Imaging.

[60]  Roberto Battiti,et al.  Identifying intrusions in computer networks with principal component analysis , 2006, First International Conference on Availability, Reliability and Security (ARES'06).

[61]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[62]  Ingemar J. Cox,et al.  Secure spread spectrum watermarking for multimedia , 1997, IEEE Trans. Image Process..

[63]  S. Sitharama Iyengar,et al.  A Survey on Malware Detection Using Data Mining Techniques , 2017, ACM Comput. Surv..

[64]  Mark Stamp,et al.  Support vector machines and malware detection , 2016, Journal of Computer Virology and Hacking Techniques.

[65]  Nikolaos V. Boulgouris,et al.  Gait Recognition Using Radon Transform and Linear Discriminant Analysis , 2007, IEEE Transactions on Image Processing.

[66]  Mark Stamp,et al.  Malware Detection Using Dynamic Birthmarks , 2016, IWSPA@CODASPY.

[67]  Mark Stamp,et al.  Singular value decomposition and metamorphic detection , 2015, Journal of Computer Virology and Hacking Techniques.

[68]  Konstantinos N. Plataniotis,et al.  Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition , 2005, Pattern Recognit. Lett..

[69]  Roberto Tronci,et al.  HMMPayl: An intrusion detection system based on Hidden Markov Models , 2011, Comput. Secur..

[70]  Joris Kinable,et al.  Malware classification based on call graph clustering , 2010, Journal in Computer Virology.

[71]  Jun Zheng,et al.  An Anomaly Intrusion Detection System Based on Vector Quantization , 2006, IEICE Trans. Inf. Syst..

[72]  Witold Kinsner,et al.  Fractal based adaptive boosting algorithm for cognitive detection of computer malware , 2016, 2016 IEEE 15th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[73]  Mark Stamp,et al.  Clustering versus SVM for malware detection , 2015, Journal of Computer Virology and Hacking Techniques.

[74]  Charles V. Wright,et al.  Uncovering Spoken Phrases in Encrypted Voice over IP Conversations , 2010, TSEC.

[75]  Tsuhan Chen,et al.  Principle component analysis and its variants for biometrics , 2002, Proceedings. International Conference on Image Processing.

[76]  Mark Stamp,et al.  Hidden Markov models for malware classification , 2015, Journal of Computer Virology and Hacking Techniques.

[77]  Wei Jiang,et al.  Secure k-nearest neighbor query over encrypted data in outsourced environments , 2013, 2014 IEEE 30th International Conference on Data Engineering.

[78]  Chun Wei,et al.  Clustering malware-generated spam emails with a novel fuzzy string matching algorithm , 2009, SAC '09.

[79]  Gonzalo Álvarez,et al.  PUMA: Permission Usage to Detect Malware in Android , 2012, CISIS/ICEUTE/SOCO Special Sessions.

[80]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[81]  Xinghuo Yu,et al.  A simple and efficient hidden Markov model scheme for host-based anomaly intrusion detection , 2009, IEEE Network.

[82]  Mark Stamp,et al.  A Revealing Introduction to Hidden Markov Models , 2017 .

[83]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[84]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[85]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[86]  Charles V. Wright,et al.  HMM profiles for network traffic classification , 2004, VizSEC/DMSEC '04.

[87]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[88]  Mark Stamp,et al.  HTTP attack detection using n-gram analysis , 2014, Comput. Secur..

[89]  Taeshik Shon,et al.  A Study on the Covert Channel Detection of TCP/IP Header Using Support Vector Machine , 2003, ICICS.

[90]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[91]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[92]  I. Jolliffe Principal Component Analysis , 2002 .