Inferring users' projects from their workstation contents

One key to providing intelligent assistance to workstation users is to construct machine-understandable descriptions of the user's ongoing projects, or activities, (e.g., their committee memberships, writing projects, conference organization activities), and indices describing which emails, meetings, and colleagues relate to which activity. This paper presents a program, ActivityExtractor, which examines the user's workstation contents to infer such activity descriptions. In earlier work [Huang, et al., 2004] we described an algorithm which infers activities by clustering the user's emails based on their word distributions. Here we extend this approach in several ways: (1) by incorporating a social network analysis of email senders and recipients, (2) by considering workstation contents beyond email, including contents accessible via Google Desktop Search, and (3) by allowing simple user input in the form of a list of activity/proj ect names. We describe the ActivityExtractor algorithms and report on experiments applying these to several users' workstations.

[1]  H. White,et al.  STRUCTURAL EQUIVALENCE OF INDIVIDUALS IN SOCIAL NETWORKS , 1977 .

[2]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[3]  Kenrick J. Mock An experimental framework for email categorization and management , 2001, SIGIR '01.

[4]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[5]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[6]  Martin Wattenberg,et al.  ReMail: a reinvented email prototype , 2004, CHI EA '04.

[7]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[8]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[9]  Tessa A. Lau,et al.  Automated email activity management: an unsupervised learning approach , 2005, IUI.

[10]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[11]  David R. Karger,et al.  Haystack: A General-Purpose Information Management Tool for End Users Based on Semistructured Data , 2005, CIDR.

[12]  E. A. Dinic Algorithm for solution of a problem of maximal flow in a network with power estimation , 1970 .

[13]  Bernardo A. Huberman,et al.  Email as spectroscopy: automated discovery of community structure within organizations , 2003 .

[14]  Victoria Bellotti,et al.  E-mail as habitat: an exploration of embedded personal information management , 2001, INTR.

[15]  Andrew McCallum,et al.  The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email , 2005 .

[16]  Andrew McCallum,et al.  Extracting social networks and contact information from email and the Web , 2004, CEAS.

[17]  Li-Te Cheng,et al.  Supporting activity-centric collaboration through peer-to-peer shared objects , 2003, GROUP '03.

[18]  Lada A. Adamic,et al.  Information flow in social groups , 2003, cond-mat/0305305.

[19]  John C. Platt,et al.  Automatic Discovery of Personal Topics to Organize Email , 2005, CEAS.

[20]  D. R. Fulkerson,et al.  Flows in Networks. , 1964 .

[21]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[22]  M. R. Rao,et al.  Combinatorial Optimization , 1992, NATO ASI Series.

[23]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[24]  Tom M. Mitchell,et al.  Inferring Ongoing Activities of Workstation Users by Clustering Email , 2004, CEAS.