Automatic modeling of user's real world activities from the web for semantic IR

We have been developing a task-based service navigation system that offers to the user services relevant to the task the user wants to perform. The system allows the user to concretize his/her request in the task-model developed by human-experts. In this study, to reduce the cost of collecting a wide variety of activities, we investigate the automatic modeling of users' real world activities from the web. To extract the widest possible variety of activities with high precision and recall, we investigate the appropriate number of contents and resources to extract. Our results show that we do not need to examine the entire web, which is too time consuming; a limited number of search results (e.g. 900 from among 21,000,000 search results) from blog contents are needed. In addition, to estimate the hierarchical relationships present in the activity model with the lowest possible error rate, we propose a method that divides the representation of activities into a noun part and a verb part, and calculates the mutual information between them. The result shows almost 80% of the hierarchical relationships can be captured by the proposed method.

[1]  David Sánchez,et al.  Domain Ontology Learning from the Web , 2009, The Knowledge Engineering Review.

[2]  Jeff A. Bilmes,et al.  Structure Learning on Large Scale Common Sense Statistical Models of Human State , 2008, AAAI.

[3]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[4]  Gilad Mishne,et al.  Learning domain ontologies for Web service descriptions: an experiment in bioinformatics , 2005, WWW '05.

[5]  Shoji Kurakake,et al.  Construction and Use of Role-Ontology for Task-Based Service Navigation System , 2006, International Semantic Web Conference.

[6]  Kenneth Ward Church,et al.  Using Statistics in Lexical Analysis , 2003, Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon.

[7]  A. Pentland,et al.  Eigenbehaviors: identifying structure in routine , 2009, Behavioral Ecology and Sociobiology.

[8]  Steffen Staab,et al.  Learning by googling , 2004, SKDD.

[9]  H. Lieberman Common Consensus : a web-based game for collecting commonsense goals , 2007 .

[10]  Henry A. Kautz,et al.  Sensor-Based Understanding of Daily Life via Large-Scale Use of Common Sense , 2006, AAAI.

[11]  Matthai Philipose,et al.  Mining models of human activities from the web , 2004, WWW '04.

[12]  Moritz Tenorth,et al.  Understanding and executing instructions for everyday manipulation tasks from the World Wide Web , 2010, 2010 IEEE International Conference on Robotics and Automation.

[13]  Shoji Kurakake,et al.  Task Knowledge Based Retrieval for Service Relevant to Mobile User's Activity , 2005, SEMWEB.

[14]  C. Shah,et al.  Building Plans for Household Tasks from Distributed Knowledge , 2022 .

[15]  Rakesh Gupta,et al.  Common Sense Data Acquisition for Indoor Mobile Robots , 2004, AAAI.

[16]  Erik T. Mueller,et al.  Modelling Space and Time in Narratives about Restaurants , 2007, Lit. Linguistic Comput..

[17]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[18]  David Snchez Domain Ontology Learning from the Web , 2008 .

[19]  Jeff A. Bilmes,et al.  Learning Large Scale Common Sense Models of Everyday Life , 2007, AAAI.