Information Extraction for the Semantic Web

The World Wide Web represents a universe of knowledge and information. Unfortunately, it is not straightforward to query and access the desired information. Languages and tools for accessing, extracting, transforming, and syndicating the desired information are required. The Web should be useful not merely for human consumption but additionally for machine communication. Therefore, powerful and user-friendly tools based on expressive languages for extracting and integrating information from various different Web sources, or in general, various heterogeneous sources are needed. The tutorial gives an introduction to Web technologies required in this context, and presents various approaches and techniques used in information extraction and integration. Moreover, sample applications in various domains motivate the discussed topics and providing data instances for the Semantic Web is illustrated.

[1]  Berthier A. Ribeiro-Neto,et al.  Extracting semi-structured data through examples , 1999, CIKM '99.

[2]  Elio Masciari,et al.  Web wrapper induction: a brief survey , 2004, AI Commun..

[3]  Brad Adelberg,et al.  NoDoSE - A Tool for Semi-Automatically Extracting Semi-Structured Data from Text Documents , 1998, SIGMOD Conference.

[4]  Manuel V. Hermenegildo,et al.  Distributed WWW Programming using (Ciao-)Prolog and the PiLLoW library , 2001, Theory Pract. Log. Program..

[5]  Arnaud Sahuguet,et al.  Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F , 1999, VLDB.

[6]  Bernd Thomas Anti-Unification Based Learning of T-Wrappers for Information Extraction , 1999 .

[7]  Craig A. Knoblock,et al.  A hierarchical approach to wrapper induction , 1999, AGENTS '99.

[8]  Erich J. Neuhold,et al.  Jedi: extracting and synthesizing information from the Web , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[9]  Bertram Ludäscher,et al.  A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web , 1999, ER.

[10]  Ángel Viña,et al.  The Wargo system: semi-automatic wrapper generation in presence of complex data access modes , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[11]  Feifei Li,et al.  Wiccap Data Model: Mapping Physical Websites to Logical Views , 2002, ER.

[12]  Nicola Henze,et al.  The Personal Publication Reader: Illustrating Web Data Extraction, Personalization and Reasoning for the Semantic Web , 2005, ESWC.

[13]  Georg Gottlob,et al.  Visual Programming of Web Data Aggregation Applications , 2003, IIWeb.

[14]  Chun-Nan Hsu,et al.  Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web , 1998, Inf. Syst..

[15]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[16]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[17]  I. V. Ramakrishnan,et al.  Computational aspects of resilient data extraction from semistructured sources (extended abstract) , 2000, PODS '00.

[18]  Nicholas Kushmerick,et al.  Wrapper verification , 2000, World Wide Web.

[19]  Brad Adelberg,et al.  NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents , 1998, SIGMOD '98.

[20]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[21]  Stefan Kuhlins,et al.  Toolkits for Generating Wrappers , 2002, NetObjectDays.

[22]  Larry Kahaner,et al.  Competitive Intelligence: How to Gather Analyze and Use Information to Move Your Business to the Top , 1996 .

[23]  Paolo Atzeni,et al.  Cut and paste , 1997, PODS '97.

[24]  Hector Garcia-Molina,et al.  Extracting Semistructured Information from the Web. , 1997 .

[25]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[26]  Rob Miller,et al.  LAPIS: smart editing with text structure , 2002, CHI Extended Abstracts.

[27]  Peter Dolog,et al.  The Personal Reader: Personalizing and Enriching Learning Resources Using Semantic Web Technologies , 2004, AH.

[28]  Georg Gottlob,et al.  Visual Web Information Extraction with Lixto , 2001, VLDB.

[29]  Martin Bergman,et al.  The deep web:surfacing the hidden value , 2000 .

[30]  Valter Crescenzi,et al.  RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.