Managing email overload with an automatic nonparametric clustering system

Email overload is a recent problem that there is increasingly difficulty that people have to process the large number of emails received daily. Currently, this problem becomes more and more serious and it has already affected the normal usage of email as a knowledge management tool. It has been recognized that categorizing emails into meaningful groups can greatly save cognitive load to process emails, and thus this is an effective way to manage the email overload problem. However, most current approaches still require significant human input for categorizing emails. In this paper, we develop an automatic email clustering system, underpinned by a new nonparametric text clustering algorithm. This system does not require any predefined input parameters and can automatically generate meaningful email clusters. The evaluation shows our new algorithm outperforms existing text clustering algorithms with higher efficiency and quality in terms of computational time and clustering quality measured by different gauges. The experimental results also well match the labeled human clustering results.

[1]  M. Aldenderfer Cluster Analysis , 1984 .

[2]  L. Hubert,et al.  Measuring the Power of Hierarchical Cluster Analysis , 1975 .

[3]  Terry R. Payne,et al.  Interface Agents That Learn an Investigation of Learning Issues in a Mail Agent Interface , 1997, Appl. Artif. Intell..

[4]  J. Leon Zhao,et al.  Automatic discovery of similarity relationships through Web mining , 2003, Decis. Support Syst..

[5]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[6]  Tessa A. Lau,et al.  Automated email activity management: an unsupervised learning approach , 2005, IUI.

[7]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[8]  Tom M. Mitchell,et al.  Text clustering with extended user feedback , 2006, SIGIR.

[9]  Hsinchun Chen,et al.  Document clustering for electronic meetings: an experimental comparison of two techniques , 1999, Decis. Support Syst..

[10]  Suzan Burton,et al.  E-mail overload , 2001 .

[11]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[12]  B. Tabachnick,et al.  Using Multivariate Statistics , 1983 .

[13]  Peter J. Denning,et al.  The profession of IT: who are we? , 2001, CACM.

[14]  Ozgur Turetken,et al.  Managing E-Mail Overload: Solutions and Future Challenges , 2007, Computer.

[15]  Candace L. Sidner,et al.  Email overload: exploring personal information management of email , 1996, CHI.

[16]  Shui-Lung Chuang,et al.  Towards automatic generation of query taxonomy: a hierarchical query clustering approach , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[17]  Betty Vandenbosch,et al.  Information Overload in a Groupware Environment: Now You See It, Now You Don't , 1998, J. Organ. Comput. Electron. Commer..

[18]  Kenrick J. Mock An experimental framework for email categorization and management , 2001, SIGIR '01.

[19]  W. Neville Holmes The Profession , 2010, Computer.

[20]  Wei-Ying Ma,et al.  Multitype Features Coselection for Web Document Clustering , 2006, IEEE Trans. Knowl. Data Eng..

[21]  Ozgur Turetken,et al.  A multi-attribute, multi-weight clustering approach to managing "e-mail overload" , 2006, Decis. Support Syst..

[22]  Minoru Sasaki,et al.  Spam detection using text clustering , 2005, 2005 International Conference on Cyberworlds (CW'05).