Task Specific Semantic Views: Extracting and Integrating Contextual Metadata from the Web

Tasks and working scenarios on the desktop involve specific context information which is useful for finding relevant documents related to that context. Automating the process of retrieving and generating this context information is important to avoid time-consuming manual annotation not feasible in everyday work. This paper focuses on automatically extracting and integrating contextual information from web pages used in such working scenarios. The key observation is that in such scenarios we often use a set of web sites to get relevant information, implicitly syndicating their data into a coherent scenario specific information space. We show how these data can be extracted automatically from the web pages stored in local browser caches, based on appropriate query wrappers over these pages. These data are then combined into a task specific semantic view, building upon schema integration rules based on a global as view approach and view materialization, and transformed into RDF metadata for enhancing contextualized search on the desktop. We describe both the conceptual framework as well as our current prototype and conclude with a discussion of further research issues.

[1]  Rada Chirkova,et al.  A formal perspective on the view selection problem , 2002, The VLDB Journal.

[2]  Andrea Calì,et al.  On the Expressive Power of Data Integration Systems , 2002, ER.

[3]  Stéphane Bressan,et al.  Context Interchange: New Features and Formalisms for the Intelligent Integration of Information Context Interchange: New Features and Formalisms for the Intelligent Integration of Information , 1997 .

[4]  Nicola Henze,et al.  The Personal Publication Reader: Illustrating Web Data Extraction, Personalization and Reasoning for the Semantic Web , 2005, ESWC.

[5]  Wolfgang Nejdl,et al.  Activity Based Metadata for Semantic Desktop Search , 2005, ESWC.

[6]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[7]  Wolf-Tilo Balke,et al.  Personalized Content Syndication in a Preference World , 2001 .

[8]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[9]  Georg Gottlob,et al.  The Lixto data extraction project: back and forth between theory and practice , 2004, PODS.

[10]  Wolfgang Nejdl,et al.  Semantically Enhanced Searching and Ranking on the Desktop , 2005, Semantic Desktop Workshop.

[11]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[12]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[13]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[14]  Georg Gottlob,et al.  Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto , 2001, LPNMR.