论文信息 - Author Identification of E-mail Messages with OLMAM Trained Feedforward Neural Networks

Author Identification of E-mail Messages with OLMAM Trained Feedforward Neural Networks

The OLMAM algorithm (optimized Levenberg-Marquardt with adaptive momentum) is a variant of the Levenberg-Marquardt algorithm for training multilayer feedforward neural networks. OLMAM has been shown to obtain excellent solutions in difficult classification problems where other computational intelligence techniques usually achieve inferior performances. In this paper we apply OLMAM to the problem of author identification of e-mail messages which is a challenging classification problem due to the special characteristics of the data. We performed a number of experiments with a corpus of real-world e-mail messages (Enron corpus). The performance of the proposed method was compared with the performances achieved by Naive-Bayes and SVM classifiers. Author identification with OLMAM was found to be significantly better compared with the other methods even if the author wrote about different topics.

Nikolaos Ampazis | G. Dounias | H. Iakovaki

[1] Dimitris A. Karras,et al. An efficient constrained learning algorithm with momentum acceleration , 1995, Neural Networks.

[2] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[3] George M. Mohay,et al. Mining e-mail content for author identification forensics , 2001, SGMD.

[4] Alistair Moffat,et al. Exploring the similarity space , 1998, SIGF.

[5] Stavros J. Perantonis,et al. Two highly efficient second-order algorithms for training feedforward networks , 2002, IEEE Trans. Neural Networks.

[6] Jörg Kindermann,et al. Authorship Attribution with Support Vector Machines , 2003, Applied Intelligence.

[7] Jorge Nocedal,et al. Global Convergence Properties of Conjugate Gradient Methods for Optimization , 1992, SIAM J. Optim..

[8] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[9] Yiming Yang,et al. The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[10] Olivier de Vel,et al. Mining E-mail Authorship , 2000 .

[11] Robert Bosch,et al. Separating Hyperplanes and the Authorship of the Disputed Federalist Papers , 1998 .