Automatic Generation of Ontology from the Deep Web

The term "deep Web" refers to Web pages that are not accessible to search engines, e.g., because those Web pages are dynamically generated in response to queries through Web forms or Web services. The existing automated Web crawlers cannot index these pages, thus they are hidden from the Web search engines. Our goal is to properly annotate such deep Web services (i.e. content generation interfaces of hidden Web sources) with semantic indexing by constructing domain-specific ontologies to represent the contents of the deep Web sources. The fully automatic derivation of ontologies from Web sources without human review is to date a challenging research issue. We present a novel approach to automatically building a large, yet domain-specific, ontology by interweaving sub-taxonomies of WordNet with domain-specific information extracted from deep Web service pages. Our algorithms extract domain concepts from deep Web sources which are augmented with concepts and relationships from WordNet to construct ontology fragments. Structurally, these are directed acyclic graphs (DAGs). An iterative process of extracting WordNet concepts and relationships and bridging concept gaps is used to tie together disparate domain concepts and ontology fragments into one ontology. Using eight domains (airfares, jobs, etc.) from a well-known test-bed, our algorithms constructed an ontology of 1692 concepts from deep Web sources and 4434 concepts from WordNet. This ontology is expressed in the OWL format to support semantic Web searches.

[1]  Paola Velardi,et al.  The Usable Ontology: An Environment for Building and Assessing a Domain Ontology , 2002, SEMWEB.

[2]  James Geller,et al.  Naturalness of Ontology Concepts for Rating Aspects of the Semantic Web , 2006 .

[3]  I. V. Ramakrishnan,et al.  OntoMiner: Bootstrapping and Populating Ontologies from Domain-Specific Web Sites , 2003, IEEE Intell. Syst..

[4]  Peishen Qi,et al.  Ontology Translation on the Semantic Web , 2003, J. Data Semant..

[5]  Steffen Staab,et al.  Annotation for the Deep Web , 2003, IEEE Intell. Syst..

[6]  Amit P. Sheth,et al.  Meteor-s web service annotation framework , 2004, WWW '04.

[7]  Munindar P. Singh Deep Web Structure , 2002, IEEE Internet Comput..

[8]  R. Navigli,et al.  Automatically extending, pruning and trimming general purpose ontologies , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[9]  Avigdor Gal,et al.  OntoBuilder: fully automatic extraction and consolidation of ontologies from Web sources , 2004, Proceedings. 20th International Conference on Data Engineering.

[10]  Clement T. Yu,et al.  Bootstrapping Domain Ontology for Semantic Web Services from Source Web Sites , 2005, TES.

[11]  H. Sofia Pinto,et al.  Ontologies: How can They be Built? , 2004, Knowledge and Information Systems.

[12]  Kevin Chen-Chuan Chang,et al.  Query Routing: Finding Ways in the Maze of the DeepWeb , 2005, International Workshop on Challenges in Web Information Retrieval and Integration.

[13]  Soon Ae Chun,et al.  Semantic deep web: automatic attribute extraction from the deep web data sources , 2007, SAC '07.

[14]  Jordi Conesa,et al.  Building Conceptual Schemas by Refining General Ontologies , 2003, DEXA.