Probabilistic learning for analysis of sensor-based human activity data

As sensors that measure daily human activity become increasingly affordable and ubiquitous, there is a corresponding need for algorithms that unearth useful information from the resulting sensor observations. Many of these sensors record a time series of counts reflecting two behaviors: (1) the underlying hourly, daily, and weekly rhythms of natural human activity, and (2) bursty periods of unusual behavior. This dissertation explores a probabilistic framework for human-generated count data that (a) models the underlying recurrent patterns and (b) simultaneously separates and characterizes unusual activity via a Poisson-Markov model. The problems of event detection and characterization using real world, noisy sensor data with significant portions of data missing and corrupted measurements due to sensor failure are investigated. The framework is extended in order to perform higher level inferences, such as linking event models in a multi-sensor building occupancy model, and incorporating the occupancy measurement from loop detectors (in addition to the count measurement) to apply the model to problems in transportation research.

[1]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[2]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[3]  C. Cornell Engineering seismic risk analysis , 1968 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Chris Chatfield,et al.  The Analysis of Time Series: An Introduction , 1981 .

[6]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  David M. Lucantoni,et al.  A Markov Modulated Characterization of Packetized Voice and Data Traffic and Related Statistical Multiplexer Performance , 1986, IEEE J. Sel. Areas Commun..

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[9]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10]  Ronald W. Wolff,et al.  Stochastic Modeling and the Theory of Queues , 1989 .

[11]  P. Green Bayesian reconstructions from emission tomography data using a modified EM algorithm. , 1990, IEEE transactions on medical imaging.

[12]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[13]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[14]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[15]  D. Rubin,et al.  The analysis of repeated-measures data on schizophrenic reaction times using mixture models. , 1995, Statistics in medicine.

[16]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[17]  David J. C. Mackay,et al.  Introduction to Monte Carlo Methods , 1998, Learning in Graphical Models.

[18]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[19]  Osama Masoud,et al.  A novel method for tracking and counting pedestrians in real-time using a single camera , 2001, IEEE Trans. Veh. Technol..

[20]  Chao Chen,et al.  The PeMS algorithms for accurate, real-time estimates of g-factors and speeds from single-loop detectors , 2001, ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No.01TH8585).

[21]  J. Kleinberg Bursty and Hierarchical Structure in Streams , 2002, Data mining and knowledge discovery.

[22]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[23]  Markos Papageorgiou,et al.  Freeway ramp metering: an overview , 2002, IEEE Trans. Intell. Transp. Syst..

[24]  Yiguo Qiao,et al.  Anomaly intrusion detection method based on HMM , 2002 .

[25]  Alexander Skabardonis,et al.  Detecting Errors and Imputing Missing Data for Single-Loop Surveillance Systems , 2003 .

[26]  Nalini Venkatasubramanian,et al.  Project rescue: challenges in responding to the unexpected , 2003, IS&T/SPIE Electronic Imaging.

[27]  Henry X. Liu,et al.  Uncovering the contribution of travel time reliability to dynamic route choice using real-time loop data , 2004 .

[28]  Heikki Mannila,et al.  Using Markov chain Monte Carlo and dynamic programming for event sequence data , 2005, Knowledge and Information Systems.

[29]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[30]  Nalini Venkatasubramanian,et al.  CAMAS: a citizen awareness system for crisis mitigation , 2004, SIGMOD '04.

[31]  Rayford B. Vaughn,et al.  Efficient Modeling of Discrete Events for Anomaly Detection Using Hidden Markov Models , 2005, ISC.

[32]  Padhraic Smyth,et al.  Adaptive event detection with time-varying poisson processes , 2006, KDD '06.

[33]  Martin Meckesheimer,et al.  Automatic outlier detection for time series: an application to sensor data , 2007, Knowledge and Information Systems.

[34]  Christopher Richard Wren,et al.  Similarity-based analysis for large networks of ultra-low resolution sensors , 2006, Pattern Recognit..

[35]  M. Bebbington Identifying volcanic regimes using Hidden Markov Models , 2007 .

[36]  Masashi Sugiyama,et al.  Change-Point Detection in Time-Series Data by Direct Density-Ratio Estimation , 2009, SDM.

[37]  Nello Cristianini,et al.  Finding surprising patterns in textual data streams , 2010, 2010 2nd International Workshop on Cognitive Information Processing.