论文信息 - Spam Deobfuscation using a Hidden Markov Model

Spam Deobfuscation using a Hidden Markov Model

To circumvent spam filters, many spammers attempt to obfuscate their emails by deliberately misspelling words or introducing other errors into the text. For example viagra may be written vigra, or mortgage written m0rt gage. Even though humans have little difficulty reading obfuscated emails, most content-based filters are unable to recognize these obfuscated spam words. In this paper, we present a hidden Markov model for deobfuscating spam emails. We empirically demonstrate that our model is robust to many types of obfuscation including misspellings, incorrect segmentations (adding/removing spaces), and substitutions/insertions of non-alphabetic characters.

Honglak Lee | Andrew Y. Ng | A. Ng | Honglak Lee

[1] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[2] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .

[4] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Peter N. Yianilos,et al. Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[7] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[8] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[9] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10] Rajat Raina,et al. Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[11] P. Kam,et al. : 4 , 1898, You Can Cross the Massacre on Foot.