Active EM to reduce noise in activity recognition

Intelligent desktop environments allow the desktop user to define a set of projects or activities that characterize the user's desktop work. These environments then attempt to identify the current activity of the user in order to provide various kinds of assistance. These systems take a hybrid approach in which they allow the user to declare their current activity but they also employ learned classifiers to predict the current activity to cover those cases where the user forgets to declare the current activity. The classifiers must be trained on the very noisy data obtained from the user's activity declarations. Instead of asking the user to review and relabel the data manually, we employ an active EM algorithm that combines the EM algorithm and active learning. EM can be viewed as retraining on its own predictions. To make it more robust, we only retrain on those predictions that are made with high confidence. For active learning, we make a small number of queries to the user based on the most uncertain instances. Experimental results on real users show this active EM algorithm can significantly improve the prediction precision, and that it performs better than either EM or active learning alone.

[1]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[2]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[3]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[4]  Claire Cardie,et al.  Limitations of Co-Training for Natural Language Learning from Large Datasets , 2001, EMNLP.

[5]  Victor Kaptelinin,et al.  UMEA: translating interaction histories into project contexts , 2003, CHI '03.

[6]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[7]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[8]  Matthai Philipose,et al.  The Probabilistic Activity Toolkit: Towards Enabling Activity-Aware Computer Interfaces , 2003 .

[9]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[10]  Thomas G. Dietterich,et al.  A hybrid learning system for recognizing user tasks from desktop activities and email messages , 2006, IUI '06.

[11]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[12]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[13]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[14]  Tom M. Mitchell,et al.  Extracting Knowledge about Users' Activities from Raw Workstation Contents , 2006, AAAI.

[15]  Eric Horvitz,et al.  Attention-Sensitive Alerting , 1999, UAI.

[16]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Eric Horvitz,et al.  Learning and reasoning about interruption , 2003, ICMI '03.

[19]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[20]  Mark Dredze,et al.  Automatically classifying emails into activities , 2006, IUI '06.

[21]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[22]  Tessa A. Lau,et al.  Automated email activity management: an unsupervised learning approach , 2005, IUI.

[23]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[24]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[25]  Foster J. Provost,et al.  Active Sampling for Class Probability Estimation and Ranking , 2004, Machine Learning.

[26]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[27]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[28]  Eric Horvitz,et al.  The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users , 1998, UAI.

[29]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.