Learning Structural Classification Rules for Web-Page Categorization

Content-related metadata plays an important role in the eort of developing intelligent web applications. One of the most established form of providing contentrelated metadata is the assignment of web-pages to content categories. We describe the Spectacle system for classifying individual web pages on the basis of their syntactic structure. This classification requires the specification of classification rules associating common page structures with predefined classes. In this paper, we propose an approach for the automatic acquisition of these classification rules using techniques from inductive logic programming and describe experiments in applying the approach to an existing web-based information system.

[1]  Marie-Christine Rousset Verifying the World Wide Web: a Position Statement , 1997, EUROVAV.

[2]  Peter Burden,et al.  Automatic RDF Metadata Generation for Resource Discovery , 1999, Comput. Networks.

[3]  Heiner Stuckenschmidt,et al.  BUISY - Using Brokered Data-Objects for Environmental Information Systems , 2000 .

[4]  John M. Pierre,et al.  On the Automated Classification of Web Sites , 2001, ArXiv.

[5]  Frank van Harmelen,et al.  WebMaster: Knowledge-Based Verification of Web-Pages , 1999, IEA/AIE.

[6]  Ashwin Srinivasan,et al.  Pharmacophore Discovery Using the Inductive Logic Programming System PROGOL , 1998, Machine Learning.

[7]  Roberto Basili,et al.  NLP-driven IR: Evaluating Performances over a Text Classification task , 2001, IJCAI.

[8]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[9]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[10]  Dan Brickley,et al.  Resource description framework (RDF) schema specification , 1998 .

[11]  S. Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning. , 1992, Protein engineering.

[12]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[13]  Stephen Muggleton,et al.  Inductive Logic Programming: Issues, Results and the LLL Challenge (abstract) , 1998, ECAI.

[14]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.