Mining Malicious Corruption of Data with Hidden Markov Models

Data mining algorithms have been applied to investigate a wide range of research issues recently. In this paper we describe an alternative technique of profiling databases via time series analysis to detect anomalous changes to a database. We view the history of modifications in the database as a set of time series sequences. We then examine the application of Hidden Markov models (HMMs)as a mining tool to capture the normal trend of a database’s changes in transactions. Rather than examining each record independently, our technique accounts for the existence of relations among groups of records, and validates modifications to the sequence of transactions. The algorithm is adaptive to changes in behavior. Experiments with real data-sets, comparing various options for the initial HMM parameters, demonstrate that the distribution of the change in acceptance probabilities of anomalous values is significantly different from that of acceptance of transactions expected by the model.

[1]  Dennis DeCoste Mining Multivariate Time-Series Sensor Data to Discover Behavior Envelopes , 1997, KDD.

[2]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[3]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[4]  Padhraic Smyth,et al.  Hidden Markov models for fault detection in dynamic system , 1993, Pattern Recognit..

[5]  Angelika I. Kokkinaki,et al.  On atypical database transactions: identification of probable frauds using machine learning for user profiling , 1997, Proceedings 1997 IEEE Knowledge and Data Engineering Exchange Workshop.

[6]  Sushil Jajodia,et al.  Surviving Information Warfare Attacks , 1999, Computer.

[7]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[8]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[9]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[10]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[11]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[12]  Eugene Santos,et al.  Learning and Predicting User Behavior for Particular Resource Use , 2001, FLAIRS Conference.

[13]  H. V. Jagadish,et al.  Semantic Compression and Pattern Extraction with Fascicles , 1999, VLDB.