On Extracting Link Information of Relationship Instances from a Web Site

Web pages from a web site can often be associated with concepts in an ontology, and pairs of web pages can also be associated with relationships between concepts. With such associations, web pages can be searched, browsed or even reorganized based on their concept and relationship labels. In this paper, we investigate the problem of extracting link information of relationship instances from a web site. We define the notion of link chain and formulate the link chain extraction problem. An extraction method based on sequential covering has been proposed to solve the problem. This paper presents the proposed method and the experiments to evaluate its performance. We have applied the method to extract link chain information from the Yahoo! Movie Web Site with very promising results.

[1]  Mohamed Quafafou,et al.  Multi-Pattern Wrappers for Relation Extraction from the Web , 2002, ECAI.

[2]  Line Eikvil,et al.  Information Extraction from World Wide Web - A Survey , 1999 .

[3]  Neel Sundaresan,et al.  Mining the Web for relations , 2000, Comput. Networks.

[4]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[5]  K. Minton Extraction Patterns for Information Extraction Tasks : A Survey , 1999 .

[6]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[7]  Dayne Freitag,et al.  Information Extraction from HTML: Application of a General Machine Learning Approach , 1998, AAAI/IAAI.

[8]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[9]  Ee-Peng Lim,et al.  Ontology-based web annotation framework for hyperlink structures , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops), 2002..

[10]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[11]  Craig A. Knoblock,et al.  Hierarchical Wrapper Induction for Semistructured Information Sources , 2004, Autonomous Agents and Multi-Agent Systems.

[12]  Nicholas Kushmerick,et al.  Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..

[13]  Vibhu O. Mittal,et al.  Applying Machine Learning for High‐Performance Named‐Entity Extraction , 2000, Comput. Intell..