A Navigational and Structural Approach for Extracting Contents from Web Portals

In a semantic Web portal, contents are described and organized based on domain ontologies, and are usually extracted from traditional portals. However, with the increasing amount of information generated each day on the Web, updating semantic portals still represents a major challenge, since this task lacks mechanisms to extract and integrate information dynamically. This paper proposes a strategy to help promoting the interoperability between portals. It consists on the extraction of contents from different Web sites on a specific domain, aiming at the instantiation of a domain ontology, and then use it to update and/or populate a semantic portal. This is carried out through the analysis of the navigational and structural characteristics of traditional portals endowed with some semantic potentiality. In order to evaluate this strategy, a tool named NECOW was implemented. NECOW performance was compared to the Google advanced search mode, and showed promising results.

[1]  Eero Hyvönen,et al.  CultureSampo - Finnish Cultural Heritage Collections on the Semantic Web 2.0 , 2009 .

[2]  Altigran Soares da Silva,et al.  Methods and Techniques for Information Extraction by Text Segmentation , 2012, AMW.

[3]  Marcos André Gonçalves,et al.  ONDUX: on-demand unsupervised learning for information extraction , 2010, SIGMOD Conference.

[4]  Edleno Silva de Moura,et al.  Joint unsupervised structure discovery and information extraction , 2011, SIGMOD '11.

[5]  Eero Hyvönen,et al.  HealthFinland - A national semantic publishing network and portal for health information , 2009, J. Web Semant..

[6]  Dave Reynolds,et al.  Semantic information portals , 2004, WWW Alt. '04.

[7]  Victor Carneiro,et al.  DeepBot: a focused crawler for accessing hidden web content , 2007, DEECS '07.

[8]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[9]  Dieter Fensel,et al.  Semantic web portals: state-of-the-art survey , 2005, J. Knowl. Manag..

[10]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[11]  Eero Hyvönen,et al.  OntoViews - A Tool for Creating Semantic Web Portals , 2004, International Semantic Web Conference.

[12]  Ana Maria de Carvalho Moura,et al.  Ontology matching for dynamic publication in semantic portals , 2009, Journal of the Brazilian Computer Society.

[13]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..

[14]  Eero Hyvönen,et al.  Collaborative Metadata Editor Integrated with Ontology Services and Faceted Portals , 2010 .