Classifying Evolving Data Streams for Intrusion Detection

Stream data classification is a challenging problem because of two important properties: its infinite length and evolving nature. Traditional learning algorithms that require several passes on the training data are not directly applicable to stream classification problem because of the infinite length of the data stream. Data streams may evolve in several ways: the prior probability distribution p(c) of a class c may change, or the prior probability of observing an example p(x) may change, or both probabilities may change. In either case, the challenge is to build a classification model that is consistent with the current concept. As a result, special techniques are required to classify evolving data streams. Network traffic can be considered as a data stream having both abovementioned properties. Thus, network intrusion detection can be considered as a stream classification problem, where each data point can be an intrusion or benign. A data point may represent a connection, or a sequence of N network packets etc.