Learning Rules for Conceptual Structure on the Web

This paper presents an infrastructure and methodology to extract conceptual structure from Web pages, which are mainly constructed by HTML tags and incomplete text. Human beings can easily read Web pages and grasp an idea about the conceptual structure of underlying data, but cannot handle excessive amounts of data due to lack of patience and time. However, it is extremely difficult for machines to accurately determine the content of Web pages due to lack of understanding of context and semantics. Our work provides a methodology and infrastructure to process Web data and extract the underlying conceptual structure, in particular relationships between ontological concepts using Inductive Logic Programming in order to help with automating the processing of the excessive amount of Web data by capturing its conceptual structures.

[1]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[2]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[3]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[4]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[5]  Douglas C. Montgomery,et al.  Applied Statistics and Probability for Engineers, Third edition , 1994 .

[6]  David W. Embley,et al.  Recognizing Ontology-Applicable Multiple-Record Web Documents , 2001, ER.

[7]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[8]  Mark Craven,et al.  Relational Learning with Statistical Predicate Invention: Better Models for Hypertext , 2001, Machine Learning.

[9]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[10]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[11]  Raymond J. Mooney,et al.  Relational learning techniques for natural language information extraction , 1998 .

[12]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[13]  Ramez Elmasri,et al.  Conceptual modeling and ontology extraction for web information , 2002 .

[14]  David W. Embley,et al.  A Conceptual-Modeling Approach to Extracting Data from the Web , 1998, ER.

[15]  Steffen Staab,et al.  Discovering Conceptual Relations from Text , 2000, ECAI.

[16]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[17]  Shan-Hwei Nienhuys-Cheng,et al.  Foundations of Inductive Logic Programming , 1997, Lecture Notes in Computer Science.

[18]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[19]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[20]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..