Self-supervised Mining of Human Activity from CGM

The goal of this paper is to describe a method to automatically extract all basic attributes namely actor, action, object, time and location which belong to an activity, and the transition between activities in each sentence retrieved from Japanese CGM (consumer generated media). Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, and insufficient consideration of interdependency among attributes. To resolve these problems, this paper proposes a novel approach that treats the activity extraction as a sequence labeling problem, and automatically makes its own training data. This approach has advantages such as domain-independence, scalability, and unnecessary hand-tagged data. Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes and transitions between activities by making only a single pass over its corpus.

[1]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[2]  Panayiotis Zaphiris,et al.  Online communities and social computing : Third International Conference, OCSC 2009, held as part of HCI International 2009, San Diego, CA, USA, July 19-24, 2009 : proceedings , 2009, INTERACT 2009.

[3]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.

[4]  Henry A. Kautz,et al.  Sensor-Based Understanding of Daily Life via Large-Scale Use of Common Sense , 2006, AAAI.

[5]  Stefan Poslad,et al.  Ubiquitous Computing: Smart Devices, Environments and Interactions , 2009 .

[6]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[7]  Yuji Matsumoto,et al.  Japanese Dependency Analysis using Cascaded Chunking , 2002, CoNLL.

[8]  Ko Fujimura,et al.  Discovering Association Rules on Experiences from Large-Scale Blog Entries , 2009, ECIR.

[9]  Luis Gravano,et al.  Extracting Relations from Large Plain-Text Collections , 1999 .

[10]  Jeff A. Bilmes,et al.  Learning Large Scale Common Sense Models of Everyday Life , 2007, AAAI.

[11]  Kôiti Hasida,et al.  Inferring Long-term User Properties Based on Users' Location History , 2007, IJCAI.

[12]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[15]  Doug Downey,et al.  Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison , 2004, AAAI.

[16]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[17]  Matthai Philipose,et al.  Mining models of human activities from the web , 2004, WWW '04.

[18]  Takahiro Kawamura,et al.  Building of Human Activity Correlation Map from Weblogs , 2009, ICSOFT.

[19]  Shinichiro Takagi,et al.  Japanese Morphological Analyzer using Word Co-occurence -JTAG , 1998, COLING-ACL.

[20]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[21]  Don Peppers,et al.  The One to One Future , 1993 .