Ontology extraction and conceptual modeling for web information

A lot of work has been done in the area of extracting data content from the Web, but less attention has been given to extracting the conceptual schemas or ontologies of underlying Web pages. The goal of the WebOntEx (Web ontology extraction) project is to make progress toward semiautomatically extracting Web ontologies by analyzing a set of Web pages that are in the same application domain. The ontology is considered a complete schema of the domain concepts. Our ontology metaconcepts are based on the extended entity-relationship (EER) model. The concepts are classified into entity types, relationships, attributes, and superclass/ subclass hierarchies. WebOntEx attempts to extract ontology concepts by analyzing the use of HTML tags and by utilizing Part-of-Speech tagging. WebOntEx applies heuristic rules and machine learning techniques, in particular, inductive logic programming (ILP).

[1]  Shan-Hwei Nienhuys-Cheng,et al.  Foundations of Inductive Logic Programming , 1997, Lecture Notes in Computer Science.

[2]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[3]  Craig A. Knoblock,et al.  Semi-automatic wrapper generation for Internet information sources , 1997, Proceedings of CoopIS 97: 2nd IFCIS Conference on Cooperative Information Systems.

[4]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[5]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[6]  Vassilis Christophides,et al.  Semantic Web Workshop: Models, Architectures and Management , 2001, SIGMOD Rec..

[7]  Nora Koch An object-oriented hypermedia reference model formally specified in UML , 2003 .

[8]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[9]  Ian Horrocks,et al.  The Semantic Web: The Roles of XML and RDF , 2000, IEEE Internet Comput..

[10]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[11]  Mark Craven,et al.  Relational Learning with Statistical Predicate Invention: Better Models for Hypertext , 2001, Machine Learning.

[12]  Brad Adelberg,et al.  NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents , 1998, SIGMOD '98.

[13]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[14]  David L. Cohn,et al.  Autonomic Computing , 2003, ISADS.

[15]  Ramez Elmasri,et al.  Schema versioning and database conversion techniques for bi-temporal databases , 2004, Annals of Mathematics and Artificial Intelligence.

[16]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[17]  Ramez Elmasri,et al.  Extracting XML DTDs from Relational Schemas in the WebOntEx System , 2001, International Conference on Internet Computing.

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.