Adaptive event detection with time-varying poisson processes

Time-series of count data are generated in many different contexts, such as web access logging, freeway traffic monitoring, and security logs associated with buildings. Since this data measures the aggregated behavior of individual human beings, it typically exhibits a periodicity in time on a number of scales (daily, weekly,etc.) that reflects the rhythms of the underlying human activity and makes the data appear non-homogeneous. At the same time, the data is often corrupted by a number of bursty periods of unusual behavior such as building events, traffic accidents, and so forth. The data mining problem of finding and extracting these anomalous events is made difficult by both of these elements. In this paper we describe a framework for unsupervised learning in this context, based on a time-varying Poisson process model that can also account for anomalous events. We show how the parameters of this model can be learned from count time series using statistical estimation techniques. We demonstrate the utility of this model on two datasets for which we have partial ground truth in the form of known events, one from freeway traffic data and another from building access data, and show that the model performs significantly better than a non-probabilistic, threshold-based technique. We also describe how the model can be used to investigate different degrees of periodicity in the data, including systematic day-of-week and time-of-day effects, and make inferences about the detected events (e.g., popularity or level of attendance). Our experimental results indicate that the proposed time-varying Poisson model provides a robust and accurate framework for adaptively and autonomously learning how to separate unusual bursty events from traces of normal human activity.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  David M. Lucantoni,et al.  A Markov Modulated Characterization of Packetized Voice and Data Traffic and Related Statistical Multiplexer Performance , 1986, IEEE J. Sel. Areas Commun..

[4]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[5]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[6]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[7]  Stephen L. Scott,et al.  Bayesian Methods and Extensions for the Two State Markov Modulated Poisson Process , 1998 .

[8]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[9]  Stephen L. Scott,et al.  Detecting Network Intrusion Using a Markov Modulated Nonhomogeneous Poisson Process , 2000 .

[10]  Pravin Varaiya,et al.  Freeway performance measurement system (pems) , 2002 .

[11]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[12]  D. Heckerman,et al.  The Markov Modulated Poisson Process and Markov Poisson Cascade with Applications to Web Traffic Modeling , 2002 .

[13]  Heikki Mannila,et al.  Using Markov chain Monte Carlo and dynamic programming for event sequence data , 2005, Knowledge and Information Systems.

[14]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.