Behavior extraction from tweets using character N-gram models

Human daily activities are stored in various kinds of data representations using ICT devices nowadays, named lifelogs. It is highly requested to retrieve useful information from lifelogs because these raw data are hard to handle. Extracting human activities from these logs is promising to enrich our life. Context-awareness services can be provided depending on user activities extracted from these logs. Recently, a lot of people post a message called tweet within Twitter to show what they are doing, thinking, feeling, and so on. Tweets have potential to record human activities, because many people post tweets so frequently every day. This paper focused on the tweets to retrieve human behavior from them. The length of tweets are limited within short sentence, so this causes some difficulties. The users will use domain-specific terms and will post grammatically incorrect sentences to fit with the constraints. These make us hard to analyze tweets with grammatical manner or with dictionaries. To tackle them, we are applying character n-gram tokenization and naive Bayes classifier to extract appropriate behavioral information from tweets. Using n-gram tokenizer, domain-specific words can be identified and incorrect grammar can be handled. Our approach is examined using real tweets in Japanese. The index of precision, recall and F-measure shows the promising results. Some experiments have been carried out to show the feasibility of our approach. At this point, our system applied to Japanese tweets but it is applicable to any other languages.

[1]  Ig-Jae Kim,et al.  Automatic Lifelog media annotation based on heterogeneous sensor fusion , 2008, 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[2]  Wenji Mao,et al.  Action knowledge extraction from Web text , 2013, 2013 IEEE International Conference on Intelligence and Security Informatics.

[3]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[4]  Masayuki Okamoto,et al.  Annotating knowledge work lifelog: term extraction from sensor and operation history , 2011, CIKM '11.

[5]  Kiyoharu Aizawa,et al.  Food Log by snapping and processing images , 2010, 2010 16th International Conference on Virtual Systems and Multimedia.

[6]  Masanobu Abe,et al.  A Life Log Collecting System Supported by Smartphone to Model Higher-Level Human Behaviors , 2012, 2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems.

[7]  Koustuv Dasgupta,et al.  User interests in social media sites: an exploration with micro-blogs , 2009, CIKM.

[8]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[9]  Gregory D. Abowd,et al.  Towards a Better Understanding of Context and Context-Awareness , 1999, HUC.

[10]  Takahiro Kawamura,et al.  Self-supervised capturing of users' activities from weblogs , 2012, Int. J. Intell. Inf. Database Syst..