Automatic Mining of Human Activity Attributes from Weblogs

In this paper, we define an activity by five basic attributes: actor, action, object, time and location. The goal of this paper is to describe a method to automatically extract all attributes in each sentence retrieved from Japanese weblogs. Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, and insufficient consideration of interdependency among attributes. To resolve these problems, this paper proposes a novel approach that uses conditional random fields and self-supervised learning. This approach treats the activity extraction as a sequence labeling problem, and has advantages such as domain-independence, scalability, and does not require any hand-tagged data. Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes by making only a single pass over its corpus. Additionally, by converting to simpler sentences, the proposed approach can deal with complex sentences retrieved from Japanese weblogs. In an experiment, this approach achieves high precision (activity: 88.87%, attributes: over 90%).

[1]  Josef Kittler,et al.  Online Communities and Social Computing , 2009, Lecture Notes in Computer Science.

[2]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.

[3]  Masayoshi Ohashi,et al.  RFID Supplement for Mobile-Based Life Log System , 2007, 2007 International Symposium on Applications and the Internet Workshops.

[4]  Jeff A. Bilmes,et al.  Learning Large Scale Common Sense Models of Everyday Life , 2007, AAAI.

[5]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[6]  Yuji Matsumoto,et al.  Japanese Dependency Analysis using Cascaded Chunking , 2002, CoNLL.

[7]  Ko Fujimura,et al.  Discovering Association Rules on Experiences from Large-Scale Blog Entries , 2009, ECIR.

[8]  Don Peppers,et al.  The One to One Future , 1993 .

[9]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[10]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[11]  Shinichiro Takagi,et al.  Japanese Morphological Analyzer using Word Co-occurence -JTAG , 1998, COLING-ACL.

[12]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Doug Downey,et al.  Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison , 2004, AAAI.

[15]  Matthai Philipose,et al.  Mining models of human activities from the web , 2004, WWW '04.

[16]  Takahiro Kawamura,et al.  Building of Human Activity Correlation Map from Weblogs , 2009, ICSOFT.

[17]  Kôiti Hasida,et al.  Inferring Long-term User Properties Based on Users' Location History , 2007, IJCAI.

[18]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[19]  Henry A. Kautz,et al.  Sensor-Based Understanding of Daily Life via Large-Scale Use of Common Sense , 2006, AAAI.

[20]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[21]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[22]  Stefan Poslad,et al.  Ubiquitous Computing: Smart Devices, Environments and Interactions , 2009 .