Mining models of human activities from the web

The ability to determine what day-to-day activity (such as cooking pasta, taking a pill, or watching a video) a person is performing is of interest in many application domains. A system that can do this requires models of the activities of interest, but model construction does not scale well: humans must specify low-level details, such as segmentation and feature selection of sensor data, and high-level structure, such as spatio-temporal relations between states of the model, for each and every activity. As a result, previous practical activity recognition systems have been content to model a tiny fraction of the thousands of human activities that are potentially useful to detect. In this paper, we present an approach to sensing and modeling activities that scales to a much larger class of activities than before. We show how a new class of sensors, based on Radio Frequency Identification (RFID) tags, can directly yield semantic terms that describe the state of the physical world. These sensors allow us to formulate activity models by translating labeled activities, such as 'cooking pasta', into probabilistic collections of object terms, such as 'pot'. Given this view of activity models as text translations, we show how to mine definitions of activities in an unsupervised manner from the web. We have used our technique to mine definitions for over 20,000 activities. We experimentally validate our approach using data gathered from actual human activity as well as simulated data.

[1]  Brian Patrick Clarkson,et al.  Life patterns : structure from wearable sensors , 2002 .

[2]  Henry A. Kautz,et al.  Inferring High-Level Behavior from Low-Level Sensors , 2003, UbiComp.

[3]  Matthai Philipose,et al.  The Probabilistic Activity Toolkit: Towards Enabling Activity-Aware Computer Interfaces , 2003 .

[4]  Michael Koch,et al.  Ubiquitous Computing , 2001, CSCW-Kompendium.

[5]  Alex Pentland,et al.  Action Reaction Learning: Automatic Visual Analysis and Synthesis of Interactive Behaviour , 1999, ICVS.

[6]  Anthony G. Cohn,et al.  Constructing qualitative event models automatically from video input , 2000, Image Vis. Comput..

[7]  Francis Heylighen,et al.  Mining Associative Meanings from the Web: from word disambiguation to the global brain , 2001 .

[8]  M. Weiser,et al.  Hot topics-ubiquitous computing , 1993 .

[9]  David Andrew Gibson,et al.  Communities and reputation on the web , 2002 .

[10]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[11]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[12]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[13]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  Michael F. Schwartz,et al.  Discovering shared interests using graph analysis , 1993, CACM.

[15]  Hwee Tou Ng,et al.  Mining topic-specific concepts and definitions on the web , 2003, WWW '03.

[16]  Aaron F. Bobick,et al.  Action recognition using probabilistic parsing , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[17]  Michael C. Mozer,et al.  The Neural Network House: An Environment that Adapts to its Inhabitants , 1998 .