A framework for adaptive mail classification

We introduce a technique based on data mining algorithms for classifying incoming messages, as a basis for an overall architecture for maintenance and management of e-mail messages. We exploit clustering techniques for grouping structured and unstructured information extracted from e-mail messages in an unsupervised way, and exploit the resulting algorithm in the process of folder creation (and maintenance) and e-mail redirection. Some initial experimental results show the effectiveness of the technique, both from an efficiency and a quality-of-results viewpoint.

[1]  Marie-Francine Moens,et al.  Automatic Indexing and Abstracting of Document Texts , 2000, Computational Linguistics.

[2]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[3]  Roberto J. Bayardo,et al.  Athena: Mining-Based Interactive Management of Text Database , 2000, EDBT.

[4]  José María Gómez Hidalgo,et al.  Combining Text and Heuristics for Cost-Sensitive Spam Filtering , 2000, CoNLL/LLL.

[5]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[6]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[7]  Georgios Paliouras,et al.  Stacking Classifiers for Anti-Spam Filtering of E-Mail , 2001, EMNLP.

[8]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[9]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[10]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[11]  Jeffrey O. Kephart,et al.  MailCat: an intelligent assistant for organizing e-mail , 1999, AGENTS '99.

[12]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[13]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[14]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.