Automatic Mining of Human Activity and Its Relationships from CGM

The goal of this paper is to describe a method to automatically extract all basic attributes namely actor, action, object, time and location which belong to an activity, and the relationships ( transition and cause) between activities in each sentence retrieved from Japanese CGM (consumer generated media). Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, insufficient consideration of interdependency among attributes, and inability of extracting causes between activities. To resolve these problems, this paper proposes a novel approach that treats the activity extraction as a sequence labeling problem, and automatically makes its own training data. This approach has advantages such as domain-independence , scalability, andunnecessary hand-tagged data . Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extractall attributes and relationships between activities by making o ly a single passover its corpus. Additionally, by converting to simpler sentences , removing stop words, utilizing html tags, google map api, and wikipedia, the proposed approach can deal with complex sentences retrieved from Japanese CGM.

[1]  Takahiro Kawamura,et al.  Building of Human Activity Correlation Map from Weblogs , 2009, ICSOFT.

[2]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[3]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[4]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[7]  Ram Dantu,et al.  A Dimension-Reduction Framework for Human Behavioral Time Series Data , 2009, AAAI Spring Symposium: Technosocial Predictive Analytics.

[8]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[9]  Jeong-Hwan Kim,et al.  Web mining based OALF model for context-aware mobile advertising system , 2009, 2009 IFIP/IEEE International Symposium on Integrated Network Management-Workshops.

[10]  Shinichiro Takagi,et al.  Japanese Morphological Analyzer using Word Co-occurence -JTAG , 1998, COLING-ACL.

[11]  Ko Fujimura,et al.  Discovering Association Rules on Experiences from Large-Scale Blog Entries , 2009, ECIR.

[12]  Kôiti Hasida,et al.  Inferring Long-term User Properties Based on Users' Location History , 2007, IJCAI.

[13]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[14]  Matthai Philipose,et al.  Mining models of human activities from the web , 2004, WWW '04.

[15]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.

[16]  Doug Downey,et al.  Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison , 2004, AAAI.

[17]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[18]  Nakagawa Hiroyuki,et al.  Self-Supervised Mining Human Activity from the Web , 2010 .