Fusion of Decision Tree and Gaussian Mixture Models for Heterogeneous Data Sets

Current data mining techniques have been developed with great success on homogeneous data. However, few techniques exist for heterogeneous data without further manipulation or consideration of dependencies among the different types of attributes. This paper presents a fusion of C4.5 Decision Tree and Gaussian Mixture Model (GMM) techniques for mixed-attribute data sets. The proposed fusion technique is used to detect anomalies in computer network data. Evaluation experiments were performed on the popular KDDCup 1999 data set using C4.5 Decision Tree, GMM and fusions of C4.5 and GMM. Experimental results showed a better performance for the proposed fusion technique compared to the individual techniques.

[1]  Yi Lu,et al.  Clustering and Classification Based Anomaly Detection , 2006, FSKD.

[2]  Ali A. Ghorbani,et al.  A Novel Covariance Matrix Based Approach for Detecting Network Anomalies , 2008, 6th Annual Communication Networks and Services Research Conference (cnsr 2008).

[3]  Wanli Ma,et al.  Automated network feature weighting-based anomaly detection , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[4]  Anthony K. H. Tung,et al.  Estimating local optimums in EM algorithm over Gaussian mixture model , 2008, ICML '08.

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Dat Tran,et al.  Fuzzy Gaussian mixture models for speaker recognition , 1998, ICSLP.

[7]  M. Shyu,et al.  A Novel Anomaly Detection Scheme Based on Principal Component Classifier , 2003 .

[8]  Charu C. Aggarwal,et al.  Re-designing distance functions and distance-based applications for high dimensional data , 2001, SGMD.

[9]  Georgios C. Anagnostopoulos,et al.  Detecting Outliers in High-Dimensional Datasets with Mixed Attributes , 2008, DMIN.

[10]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[11]  M. Bahrololum,et al.  Anomaly Intrusion Detection System Using Hierarchical Gaussian Mixture Model , 2008 .

[12]  Aurobindo Sundaram,et al.  An introduction to intrusion detection , 1996, CROS.

[13]  Daoqiang Zhang,et al.  Hybrid neural network and C4.5 for misuse detection , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[14]  Kwong-Sak Leung,et al.  Scalable model-based clustering for large databases based on data summarization , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[16]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[17]  Wei Hu,et al.  Network-based intrusion detection using Adaboost algorithm , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[18]  David J. Marchette A Statistical Method for Profiling Network Traffic , 1999, Workshop on Intrusion Detection and Network Monitoring.

[19]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[20]  Philip K. Chan,et al.  A Machine Learning Approach to Anomaly Detection , 2003 .

[21]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[22]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[23]  Ping Guo,et al.  Outlier Detection in High Dimension Based on Projection , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[24]  Srinivasan Parthasarathy,et al.  LOADED: link-based outlier and anomaly detection in evolving data sets , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[25]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.