Email labelling by rough clustering

Previously, there were little possibilities to sort mails and later emails: we could only arrange them into folders. One mail or email could be put into exactly one folder, based on sender, subject or priority. Later in Gmail the labelling of emails was introduced: virtual folders were generated by the multiple labels that could be assigned to one email. This kind of labeling was taken by other mailers, photo and music organizer softwares, too. It is a pleasure to use a well organized collection, but usually paintaking to set up the its labelling. We simplify these kind of tasks by using our experience in rough set theory and clustering. The clustering is a well-known part of the data mining, where the elements are grouped by their similarity. The similarity is an inexact concept in real life, e. g. we easily mix up two Japanese persons, however a Chinese man easily dierentiates them. We suggest that the rough clustering could help to combine data based on similarities from dierent sources, because it has some error-correcting property. In this article we present a method to label emails and its mathematical background.

[1]  Zoltán Csajbók,et al.  Partial approximative set theory: A generalization of the rough set theory , 2010, 2010 International Conference of Soft Computing and Pattern Recognition.

[2]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[3]  Andrzej Skowron,et al.  Rudiments of rough sets , 2007, Inf. Sci..

[4]  L. Polkowski Rough Sets: Mathematical Foundations , 2013 .

[5]  M. D. McIlroy,et al.  Development of a Spelling List , 1982, IEEE Trans. Commun..

[6]  J. G. Carbonell,et al.  Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing , 2003, Lecture Notes in Computer Science.

[7]  Tamás Mihálydeák,et al.  Rough Clustering Generated by Correlation Clustering , 2013, RSFDGrC.

[8]  Tamás Mihálydeák,et al.  A General Set Theoretic Approximation Framework , 2012, IPMU.

[9]  Ben Shneiderman,et al.  Sorting out searching: a user-interface framework for text searches , 1998, CACM.