A Higher Order Collective Classifier for detecting and classifying network events

Labeled Data is scarce. Most statistical machine learning techniques rely on the availability of a large labeled corpus for building robust models for prediction and classification. In this paper we present a Higher Order Collective Classifier (HOCC) based on Higher Order Learning, a statistical machine learning technique that leverages latent information present in co-occurrences of items across records. These techniques violate the IID assumption that underlies most statistical machine learning techniques and have in prior work outperformed first order techniques in the presence of very limited data. We present results of applying HOCC to two different network data sets, first for detection and classification of anomalies in a Border Gateway Protocol dataset and second for building models of users from Network File System calls to perform masquerade detection. The precision of our system has been shown to be 30% better than the standard Naive Bayes technique for masquerade detection. These results indicate that HOCC can successfully model a variety of network events and can be applied to solve difficult problems in security using the general framework proposed.

[1]  Zhen Wu,et al.  An internet routing forensics framework for discovering rules of abnormal BGP events , 2005, CCRV.

[2]  Francisco Azuaje,et al.  Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques 2nd edition , 2006 .

[3]  W. Bruce Croft,et al.  Corpus-based stemming using cooccurrence of word variants , 1998, TOIS.

[4]  Tshilidzi Marwala,et al.  Predicting the Presence of Internet Worms using Novelty Detection , 2007, ArXiv.

[5]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  Jennifer Neville,et al.  Why collective inference improves relational classification , 2004, KDD.

[8]  William M. Pottenger,et al.  A Framework for Understanding LSI Performance , 2004 .

[9]  Philip Edmonds Choosing the word most typical in context using a lexical co-occurrence network , 1997 .

[10]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[11]  Padma Raghavan,et al.  Level search schemes for information filtering and retrieval , 2001, Inf. Process. Manag..

[12]  William M. Pottenger,et al.  Detection of Interdomain Routing Anomalies Based on Higher-Order Path Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).

[13]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[14]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..

[15]  Joan Feigenbaum,et al.  Learning-based anomaly detection in BGP updates , 2005, MineNet '05.

[16]  Gerhard Weikum,et al.  Graph-based text classification: learn from your neighbors , 2006, SIGIR.

[17]  Jennifer Neville,et al.  Dependency networks for relational data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[18]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[19]  Daniel Massey,et al.  Detection of invalid routing announcement in the Internet , 2002, Proceedings International Conference on Dependable Systems and Networks.

[20]  Salvatore J. Stolfo,et al.  One-Class Training for Masquerade Detection , 2003 .

[21]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[22]  Lise Getoor,et al.  Introduction to the special issue on link mining , 2005, SKDD.

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[24]  David Moore,et al.  The Spread of the Witty Worm , 2004, IEEE Secur. Priv..