Extraction of Hidden Semantics from Web Pages

One of the main limitation when accessing web is the lack of explicit structure, whose presence may help in understanding data semantics. Here, an approach to extract logical schema from web pages is presented, defining a page model where its contents is divided into "logical" sections, i.e. parts of a page each collecting related information. This model aims to take into account both traditional, static HTML pages, as well as dynamic pages content.

[1]  I. V. Ramakrishnan,et al.  A layered architecture for querying dynamic Web content , 1999, SIGMOD '99.

[2]  Stefano Paraboschi,et al.  Design principles for data-intensive Web sites , 1999, SGMD.

[3]  James A. Hendler,et al.  Towards the semantic web: knowledge representation in a dynamic, distributed environment , 2001 .

[4]  James A. Hendler,et al.  Dynamic Ontologies on the Web , 2000, AAAI/IAAI.

[5]  Brad Adelberg,et al.  NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents , 1998, SIGMOD '98.

[6]  Hector Garcia-Molina,et al.  Extracting Semistructured Information from the Web. , 1997 .

[7]  Peter M. G. Apers Identifying Internet-related Database Research , 1994, East/West Database Workshop.

[8]  Vincenza Carchiolo,et al.  Extracting Logical Schema from the Web , 2004, Applied Intelligence.

[9]  Dongwon Lee,et al.  Semantic Data Modeling Using XML Schemas , 2001, ER.

[10]  Chew Lim Tan,et al.  Web Structure Analysis for Information Mining , 2003, Web Document Analysis.

[11]  Steve Lawrence,et al.  Context in Web Search , 2000, IEEE Data Eng. Bull..

[12]  Erich J. Neuhold,et al.  Jedi: extracting and synthesizing information from the Web , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[13]  Ian Horrocks,et al.  The Semantic Web: The Roles of XML and RDF , 2000, IEEE Internet Comput..

[14]  Dan Suciu,et al.  On database theory and XML , 2001, SGMD.

[15]  Dan Smith,et al.  Information extraction for semi-structured documents , 1997 .

[16]  Vincenza Carchiolo,et al.  Structuring the Web , 2000, Proceedings 11th International Workshop on Database and Expert Systems Applications.

[17]  Balachander Krishnamurthy,et al.  Focusing search in hierarchical structures with directory sets , 1998, CIKM '98.

[18]  Guido Moerkotte,et al.  Evaluating Queries on Structure with eXtended Access Support Relations , 2000, WebDB.