Hidden Markov Models for Automated Protocol Learning

Hidden Markov Models (HMMs) have applications in several areas of computer security. One drawback of HMMs is the selection of appropriate model parameters, which is often ad hoc or requires domain-specific knowledge. While algorithms exist to find local optima for some parameters, the number of states must always be specified and directly impacts the accuracy and generality of the model. In addition, domain knowledge is not always available or may be based on assumptions that prove incorrect or sub-optimal.

[1]  Cosma Rohilla Shalizi,et al.  Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences , 2004, UAI.

[2]  D. Vere-Jones Markov Chains , 1972, Nature.

[3]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[4]  Arlindo L. Oliveira,et al.  Inference of regular languages using state merging algorithms with search , 2005, Pattern Recognit..

[5]  Dawn Xiaodong Song,et al.  Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering , 2009, CCS.

[6]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[7]  Tao Jiang,et al.  Minimum entropy clustering and applications to gene expression analysis , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[8]  Randy H. Katz,et al.  Protocol-Independent Adaptive Replay of Application Dialog , 2006, NDSS.

[9]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[10]  Jon Postel,et al.  Internet Control Message Protocol , 1981, RFC.

[11]  J. Crutchfield,et al.  Regularities unseen, randomness observed: levels of entropy convergence. , 2001, Chaos.

[12]  Xuxian Jiang,et al.  Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution , 2008, NDSS.

[13]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Young,et al.  Inferring statistical complexity. , 1989, Physical review letters.

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  Marc Dacier,et al.  ScriptGen: an automated script generation tool for Honeyd , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[18]  Kristina Lisa Shalizi,et al.  Pattern Discovery in Time Series, Part I: Theory, Algorithm, Analysis, and Convergence , 2002 .

[19]  Patrice Godefroid Random testing for security: blackbox vs. whitebox fuzzing , 2007, RT '07.