Self-supervised capturing of users' activities from weblogs

The goal of this paper is to describe a method to automatically extract all basic attributes namely actor, action, object, time and location which belong to an activity from Japanese weblogs. Sentences retrieved from weblogs are often diversified, complex, syntactically wrong, have emoticons and new words. There are some works that have tried to extract users' activities in sentences retrieved from web and weblogs. However, these works have several limitations, such as inability of extracting infrequent activities, high setup cost, limitation on the types of sentences that can be handled, necessary of preparing a list of object and action. To resolve these problems, we propose a novel approach that treats the activity extraction as a sequence labelling problem, and automatically makes its own training data. This approach can extract infrequent activities, and has advantages such as scalability, and unnecessary any hand-tagged data. Since it does not require to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes, with high recall.

[1]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[2]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[3]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[4]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[5]  Stefan Poslad,et al.  Ubiquitous Computing: Smart Devices, Environments and Interactions , 2009 .

[6]  Andrew McCallum,et al.  Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences , 2003 .

[7]  Takahiro Kawamura,et al.  Building of Human Activity Correlation Map from Weblogs , 2009, ICSOFT.

[8]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[9]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[10]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[11]  Doug Downey,et al.  Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison , 2004, AAAI.

[12]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[13]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Koustuv Dasgupta,et al.  User interests in social media sites: an exploration with micro-blogs , 2009, CIKM.

[16]  Kôiti Hasida,et al.  Inferring Long-term User Properties Based on Users' Location History , 2007, IJCAI.

[17]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[18]  Ko Fujimura,et al.  Discovering Association Rules on Experiences from Large-Scale Blog Entries , 2009, ECIR.

[19]  Matthai Philipose,et al.  Mining models of human activities from the web , 2004, WWW '04.

[20]  Henry A. Kautz,et al.  Sensor-Based Understanding of Daily Life via Large-Scale Use of Common Sense , 2006, AAAI.

[21]  Matthai Philipose,et al.  Common Sense Based Joint Training of Human Activity Recognizers , 2007, IJCAI.

[22]  Qiang Yang,et al.  Cross-domain activity recognition , 2009, UbiComp.

[23]  Takahiro Kawamura,et al.  Human Activity Mining Using Conditional Radom Fields and Self-Supervised Learning , 2010, ACIIDS.

[24]  Jeong-Hwan Kim,et al.  Web mining based OALF model for context-aware mobile advertising system , 2009, 2009 IFIP/IEEE International Symposium on Integrated Network Management-Workshops.

[25]  Masayoshi Ohashi,et al.  RFID Supplement for Mobile-Based Life Log System , 2007, 2007 International Symposium on Applications and the Internet Workshops.

[26]  Shinichiro Takagi,et al.  Japanese Morphological Analyzer using Word Co-occurence -JTAG , 1998, COLING-ACL.

[27]  Prasenjit Mitra,et al.  Temporal and Information Flow Based Event Detection from Social Text Streams , 2007, AAAI.

[28]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.