Iterative Boolean combination of classifiers in the ROC space: An application to anomaly detection with HMMs

Hidden Markov models (HMMs) have been shown to provide a high level performance for detecting anomalies in sequences of system calls to the operating system kernel. Using Boolean conjunction and disjunction functions to combine the responses of multiple HMMs in the ROC space may significantly improve performance over a ''single best'' HMM. However, these techniques assume that the classifiers are conditional independent, and their of ROC curves are convex. These assumptions are violated in most real-world applications, especially when classifiers are designed using limited and imbalanced training data. In this paper, the iterative Boolean combination (IBC) technique is proposed for efficient fusion of the responses from multiple classifiers in the ROC space. It applies all Boolean functions to combine the ROC curves corresponding to multiple classifiers, requires no prior assumptions, and its time complexity is linear with the number of classifiers. The results of computer simulations conducted on both synthetic and real-world host-based intrusion detection data indicate that the IBC of responses from multiple HMMs can achieve a significantly higher level of performance than the Boolean conjunction and disjunction combinations, especially when training data are limited and imbalanced. The proposed IBC is general in that it can be employed to combine diverse responses of any crisp or soft one- or two-class classifiers, and for wide range of application domains.

[1]  S D Walter,et al.  The partial area under the summary ROC curve , 2005, Statistics in medicine.

[2]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[3]  Venu Govindaraju,et al.  Review of Classifier Combination Methods , 2008, Machine Learning in Document Analysis and Recognition.

[4]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[5]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[6]  William B. Langdon,et al.  Evolving Receiver Operating Characteristics for Data Fusion , 2001, EuroGP.

[7]  B. Craig,et al.  Estimating disease prevalence in the absence of a gold standard , 2002, Statistics in medicine.

[8]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[9]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[10]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Robert Sabourin,et al.  Combining Hidden Markov Models for Improved Anomaly Detection , 2009, 2009 IEEE International Conference on Communications.

[12]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[13]  Kenneth W. Bauer,et al.  Receiver operating characteristic curves and fusion of multiple classifiers , 2003, Sixth International Conference of Information Fusion, 2003. Proceedings of the.

[14]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[15]  Steven N. Thorsen,et al.  A Boolean Algebra of receiver operating characteristic curves , 2007, 2007 10th International Conference on Information Fusion.

[16]  John Daugman,et al.  Biometric decision landscapes , 2000 .

[17]  Kymie M. C. Tan,et al.  Determining the operational limits of an anomaly-based intrusion detector , 2003, IEEE J. Sel. Areas Commun..

[18]  Alvaro A. Cárdenas,et al.  Optimal ROC Curve for a Combination of Classifiers , 2007, NIPS.

[19]  Peter A. Flach,et al.  Repairing Concavities in ROC Curves , 2005, IJCAI.

[20]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[21]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[22]  M. Pepe,et al.  Combining diagnostic test results to increase accuracy. , 2000, Biostatistics.

[23]  Raymond N. J. Veldhuis,et al.  Threshold-optimized decision-level fusion and its application to biometrics , 2009, Pattern Recognit..

[24]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[25]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[26]  Lambert Schomaker,et al.  Variants of the Borda count method for combining ranked classifier hypotheses , 2000 .

[27]  Bogdan Gabrys,et al.  Classifier selection for majority voting , 2005, Inf. Fusion.

[28]  Mahesan Niranjan,et al.  Realisable Classifiers: Improving Operating Performance on Variable Cost Problems , 1998, BMVC.

[29]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[30]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[31]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[32]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[33]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[34]  Ramanarayanan Viswanathan,et al.  Optimal distributed decision fusion , 1989 .

[35]  Jean L Freeman,et al.  A non-parametric method for the comparison of partial areas under ROC curves and its application to large health care data sets. , 2002, Statistics in medicine.

[36]  Abbes Amira,et al.  Structural hidden Markov models for biometrics: Fusion of face and fingerprint , 2008, Pattern Recognit..

[37]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[38]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[39]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[40]  Bo Gao,et al.  HMMs (Hidden Markov models) based on anomaly intrusion detection method , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[41]  Robert Sabourin,et al.  Pareto analysis for the selection of classifier ensembles , 2008, GECCO '08.

[42]  Lucila Ohno-Machado,et al.  Combining Classifiers Using Their Receiver Operating Characteristics and Maximum Likelihood Estimation , 2005, MICCAI.

[43]  Changyu Shen On the Principles of Believe the Positive and Believe the Negative for Diagnosis Using Two Continuous Tests , 2021, Journal of Data Science.

[44]  Jiankun Hu,et al.  An efficient hidden Markov model training scheme for anomaly intrusion detection of server applications based on system calls , 2004, Proceedings. 2004 12th IEEE International Conference on Networks (ICON 2004) (IEEE Cat. No.04EX955).

[45]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[46]  Josef Kittler,et al.  Fixed and trained combiners for fusion of imbalanced pattern classifiers , 2002, Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002. (IEEE Cat.No.02EX5997).

[47]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Pramod K. Varshney,et al.  Distributed Detection and Data Fusion , 1996 .

[49]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[50]  Fabio Roli,et al.  The Behavior Knowledge Space Fusion Method: Analysis of Generalization Error and Strategies for Performance Improvement , 2003, Multiple Classifier Systems.

[51]  J. Hanley The Robustness of the "Binormal" Assumptions Used in Fitting ROC Curves , 1988, Medical decision making : an international journal of the Society for Medical Decision Making.

[52]  Lawrence O. Hall,et al.  A New Ensemble Diversity Measure Applied to Thinning Ensembles , 2003, Multiple Classifier Systems.

[53]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[54]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[55]  B. V. K. Vijaya Kumar,et al.  Role of Statistical Dependence Between Classifier Scores in Determining the Best Decision Fusion Rule for Improved Biometric Verification , 2006, MRCS.

[56]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[57]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.