Inferring Ongoing Activities of Workstation Users by Clustering Email

We are interested in automatically discovering the key ongoing activities of a workstation user, such as committees to which she belongs, writing projects in which she is involved, etc., based on the contents of her workstation. The thesis underlying our research is that this collection of user activities can be automatically inferred from the variety of data available on most users’ workstations, including their emails, files, online calendar, and history of web page accesses. Knowledge about the user's activities could be used in a variety of ways, such as cross-indexing email, calendar events, files, and web accesses according to activity, or automatically producing a 'briefing folder' for each meeting on the user's calendar. We describe here our initial research on inferring such activities by examining only the user's email. In particular, we describe a variety of unsupervised clustering methods designed for clustering emails by user activity, and the use of information extractors and pretrained classifiers to infer additional information about each discovered cluster. Experimental results are presented for emails from three users.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[3]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.