Email Classification with Temporal Features

We propose a novel solution to the email classification problem: the integration of temporal information with the traditional content-based classification approaches. We discover temporal relations in an email sequence in the form of temporal sequential patterns and embed the discovered information into contentbased learning methods. The new heterogeneous classification system shows a good performance reducing the classification error by up to 22%.

[1]  João Gama,et al.  Combining Classifiers by Constructive Induction , 1998, ECML.

[2]  P. S. Sastry,et al.  Generalized frequent episodes in Event Sequences , 2002 .

[3]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[4]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[5]  Frank Klawonn,et al.  Finding informative rules in interval sequences , 2001, Intell. Data Anal..

[6]  Judy Kay,et al.  Automatic Induction of Rules of e-mail Classification , 2001 .

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[9]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[10]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[11]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[12]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[13]  Ada Wai-Chee Fu,et al.  Discovering Temporal Patterns for Interval-Based Events , 2000, DaWaK.

[14]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.