Web page summarization for handheld devices: a natural language approach

Summarization of web pages is a very interesting topicfrom both academic and commercial point of view.Academically, it is challenging to create a summary of adocument (e.g. a web page) that is highly structured andhas multi-media components in it. From the commercialpoint of view, it is advantageous to summarize web pagesto be viewed in small display devices such as PDAs andcell phones. Summarization not only makes web browsingand navigation easier, but it makes browsing faster ascomplete web pages need not be downloaded beforeviewing. In this paper, a novel combination of naturallanguage and non-natural language based summarizationtechniques have been used to automatically generate anintelligent re-authored display of web pages in real time.

[1]  Vibhu O. Mittal,et al.  Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries , 1999, SIGIR '99.

[2]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[3]  Fuad Rahman,et al.  Automatic summarization of Web content to smaller display devices , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[4]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[5]  Vibhu O. Mittal,et al.  Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries (poster abstract). , 1998, SIGIR 1999.

[6]  Kathleen R. McKeown,et al.  Columbia multi-document summarization : Approach and evaluation , 2001 .

[7]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[8]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[9]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[10]  Ahmad Fuad Rezaur Rahman,et al.  Extraction and Management of Content from HTML Documents , 2003, Web Document Analysis.

[11]  Apostolos Antonacopoulos,et al.  Web Document Analysis: Challenges and Opportunities , 2003 .

[12]  George A. Miller,et al.  WordNet: A Lexical Database for the English Language , 2002 .

[13]  Proceedings Seventh International Conference on Document Analysis and Recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..