Semantic Structure Analysis of Web Documents

[1]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[2]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[3]  Jan-Ming Ho,et al.  Discovering informative content blocks from Web documents , 2002, KDD.

[4]  Soumen Chakrabarti,et al.  Accelerated focused crawling through online relevance feedback , 2002, WWW.

[5]  Massimo Melucci,et al.  Web Document Retrieval Using Passage Retrieval, Connectivity Information, and Automatic Link Weighting--TREC-9 Report , 2000, TREC.

[6]  Alistair Moffat,et al.  Efficient Retrieval of Partial Documents , 1995, Inf. Process. Manag..

[7]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[8]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[9]  Pabitra Mitra,et al.  Extracting semantic structure of web documents using content and visual information , 2005, WWW '05.

[10]  Ada Wai-Chee Fu,et al.  Finding Structure and Characteristics of Web Documents for Classification , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[11]  Kui-Lam Kwok,et al.  TREC-9 Cross Language, Web and Question-Answering Track Experiments using PIRCS , 2000, TREC.

[12]  Ross Wilkinson,et al.  Effective retrieval of structured documents , 1994, SIGIR '94.

[13]  Baoyao Zhou,et al.  Function-based object model towards website adaptation , 2001, WWW '01.

[14]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[15]  David W. Embley,et al.  Record-boundary discovery in Web documents , 1999, SIGMOD '99.

[16]  Wei-Ying Ma,et al.  Extracting Content Structure for Web Pages Based on Visual Representation , 2003, APWeb.

[17]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[18]  Soumen Chakrabarti,et al.  Enhanced topic distillation using text, markup tags, and hyperlinks , 2001, SIGIR '01.

[19]  Víctor Pàmies,et al.  Open Directory Project , 2003 .

[20]  Soumen Chakrabarti,et al.  Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction , 2001, WWW '01.

[21]  Timo Laakko,et al.  Two approaches to bringing Internet services to WAP devices , 2000, Comput. Networks.

[22]  Wei-Ying Ma,et al.  VIPS: a Vision-based Page Segmentation Algorithm , 2003 .

[23]  Justin Zobel,et al.  Effective ranking with arbitrary passages , 2001 .