Mining E-mail Authorship

this paper we report an investigation into the learning of authorship identication or categorisation for the case of e-mail documents. We use various e-mail document features such as structural characteristics and linguistic evidence to- gether with the Support Vector Machine as the learning al- gorithm. Experiments on a number of e-mail documents give promising results with some e-mail document features and author categories giving better categorisation performance results.

[1]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[2]  Robert J. Valenza,et al.  Was the Earl of Oxford the true Shakespeare , 1991 .

[3]  Stephen G. MacDonell,et al.  IDENTIFIED: software authorship analysis with case-based reasoning , 1998 .

[4]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[5]  Robert Bosch,et al.  Separating Hyperplanes and the Authorship of the Disputed Federalist Papers , 1998 .

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[7]  Eugene H. Spafford,et al.  Software forensics: Tracking code to its authors , 1993 .

[8]  Stephen G. MacDonell,et al.  Software Forensics: Extending Authorship Analysis Techniques to Computer Programs , 2002 .

[9]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[10]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[11]  Eugene H. Spafford,et al.  Authorship analysis: identifying the author of a program , 1997, Comput. Secur..

[12]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[13]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[16]  Curtis R. Cook,et al.  Programming style authorship analysis , 1989, CSC '89.

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.