Summarizing Web Sites Automatically

This research is directed towards automating the Web Site summarization task. To achieve this objective, an approach, which applies machine learning and natural language processing techniques, is employed. The automatically generated summaries are compared to manually constructed summaries from DMOZ Open Directory Project. The comparison is performed via a formal evaluation process involving human subjects. Statistical evaluation of the results demonstrates that the automatically generated summaries are as informative as human authored DMOZ summaries and significantly more informative than home page browsing or time limited site browsing.

[1]  Mary Ellen Okurowski,et al.  A Scalable Summarization System Using Robust NLP , 1997 .

[2]  Evangelos E. Milios,et al.  World Wide Web site summarization , 2004, Web Intell. Agent Syst..

[3]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[4]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.

[5]  Christopher J. Fox,et al.  Lexical Analysis and Stoplists , 1992, Information Retrieval: Data Structures & Algorithms.

[6]  Campbell B. Read,et al.  Zipf's Law , 2004 .

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[9]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[10]  Cécile Paris,et al.  Automatically summarising Web sites: is there a way around it? , 2000, CIKM '00.

[11]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[12]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[13]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[14]  Inderjeet Mani Recent developments in text summarization , 2001, CIKM '01.

[15]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[16]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[17]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.