Statistical learning methods for information security: fundamentals and case studies

One of the most traditional methods for information security can be as easy as sequence matching, such as the signature-based methods for virus detection. However, it is now well accepted that the signature-based methods are no longer satisfactory solutions for many security problems. The signature is usually too rigid, resulting in detection that is hard to adjust and easy to bypass. Statistical learning approaches can complete the puzzle to form an integrated defense system. Numerous statistical learning methods have been proposed in the last couple of decades for various applications. To solve information security problems statistically, we need to carefully choose appropriate statistical learning methods and evaluation procedures so that what seems to be a meaningful and effective method in terms of the statistical analysis can also be beneficial when the method is deployed to the real world. This paper aims to give an introductory and as self-contained as possible overview for how to correctly and effectively apply statistical methods to information security problems. We also demonstrate a couple of applications of the statistical learning methods on the problems of botnet detection and account security. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Ying Chen,et al.  Hybrid Intrusion Detection with Weighted Signature Generation over Anomalous Internet Episodes , 2007, IEEE Transactions on Dependable and Secure Computing.

[2]  Chun-Ying Huang,et al.  Effective bot host detection based on network failure models , 2013, Comput. Networks.

[3]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[4]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[7]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[8]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[9]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[10]  Hsing-Kuo Kenneth Pao,et al.  Game Bot Detection Based on Avatar Trajectory , 2008, ICEC.

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Hsing-Kuo Kenneth Pao,et al.  Malicious URL Detection Based on Kolmogorov Complexity Estimation , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[13]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[14]  OzturkCelal,et al.  A comprehensive survey , 2014 .

[15]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[16]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[17]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[18]  Nuno Vasconcelos,et al.  Risk minimization, probability elicitation, and cost-sensitive SVMs , 2010, ICML.

[19]  W. Timothy Strayer,et al.  Using Machine Learning Techniques to Identify Botnet Traffic , 2006 .

[20]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Yuxin Ding,et al.  Host-based intrusion detection using dynamic and static behavioral models , 2003, Pattern Recognit..

[22]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[23]  Peter L. Bartlett,et al.  Open problems in the security of learning , 2008, AISec '08.

[24]  Rahul Khanna,et al.  System approach to intrusion detection using hidden Markov model , 2006, IWCMC '06.

[25]  Thomas G. Dietterich,et al.  Bioinformatics The Machine Learning Approach 2nd ed. , 2001 .

[26]  Blaine Nelson,et al.  Adversarial machine learning , 2019, AISec '11.

[27]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[28]  Hsing-Kuo Kenneth Pao,et al.  Trajectory analysis for user verification and recognition , 2012, Knowl. Based Syst..

[29]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[30]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[31]  Zhendong Su,et al.  On deriving unknown vulnerabilities from zero-day polymorphic and metamorphic worm exploits , 2005, CCS '05.

[32]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[33]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[34]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[35]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[38]  Andrew P. Bradley,et al.  Rule extraction from support vector machines: A review , 2010, Neurocomputing.

[39]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[40]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[41]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[42]  Su-Yun Huang,et al.  Nonlinear Dimension Reduction with Kernel Sliced Inverse Regression , 2009, IEEE Transactions on Knowledge and Data Engineering.

[43]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[44]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[45]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[46]  Stefan Axelsson,et al.  The base-rate fallacy and the difficulty of intrusion detection , 2000, TSEC.

[47]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[48]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[49]  eon BottouAT Stochastic Gradient Learning in Neural Networks , 2022 .

[50]  Gustavo A. Stolovitzky,et al.  Bioinformatics: The Machine Learning Approach , 2002 .

[51]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[52]  W. Timothy Strayer,et al.  Botnet Detection Based on Network Behavior , 2008, Botnet Detection.

[53]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[54]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[55]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[56]  R. Marshall 5. Multidimensional Scaling. 2nd edn. Trevor F. Cox and Michael A. A. Cox, Chapman & Hall/CRC, Boca Raton, London, New York, Washington DC, 2000. No. of pages: xiv + 309. Price: $79.95. ISBN 1‐58488‐094‐5 , 2002 .

[57]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[58]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[59]  Marius Kloft,et al.  Toward Supervised Anomaly Detection , 2014, J. Artif. Intell. Res..

[60]  Marina Meila,et al.  A Comparison of Spectral Clustering Algorithms , 2003 .

[61]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[62]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[63]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[64]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[65]  Pieter H. Hartel,et al.  Panacea: Automating Attack Classification for Anomaly-Based Network Intrusion Detection Systems , 2009, RAID.

[66]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[67]  Bart Baesens,et al.  Decompositional Rule Extraction from Support Vector Machines by Active Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[68]  Nuno Vasconcelos,et al.  Cost-Sensitive Boosting , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Marius Kloft,et al.  Security analysis of online centroid anomaly detection , 2010, J. Mach. Learn. Res..

[70]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[71]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[72]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[73]  Salvatore J. Stolfo,et al.  A data mining framework for building intrusion detection models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[74]  Chun-Ying Huang,et al.  Fast-Flux Bot Detection in Real Time , 2010, RAID.

[75]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[76]  Kang G. Shin,et al.  Measurement and analysis of global IP-usage patterns of fast-flux botnets , 2011, 2011 Proceedings IEEE INFOCOM.

[77]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[78]  Yang Zhou,et al.  Structure Learning of Probabilistic Graphical Models: A Comprehensive Survey , 2011, ArXiv.

[79]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.