Towards a new summarization approach for search engine results: An application for Turkish

With the drastic increase of available information sources on the Internet, people with different backgrounds share the same problem: locating useful information for their actual needs. Search engines make this task easier only in certain ways; people still have to do the sifting process by themselves. At this point, automatic summarization can complement the task of search engines. In this paper, we consider a new summarization approach for Web information retrieval; i.e. structure-preserving and query-biased summarization. We evaluate this approach on Turkish Web documents using TREC-like topics defined for Turkish. The results of the task-based evaluation show that this approach has significant improvement over Google snippets and unstructured query-biased summaries in terms of f-measure using the relevance prediction approach.

[1]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[2]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[3]  Ryen W. White,et al.  A task-oriented study on the influencing effects of query-biased summarisation in web searching , 2003, Inf. Process. Manag..

[4]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[5]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[6]  Amanda Spink,et al.  An analysis of Web searching by European AlltheWeb.com users , 2005, Inf. Process. Manag..

[7]  Richard M. Schwartz,et al.  Task-based evaluation of text summarization using Relevance Prediction , 2007, Inf. Process. Manag..

[8]  Gail E. Kaiser,et al.  Automating Content Extraction of HTML Documents , 2005, World Wide Web.

[9]  Fuad Rahman,et al.  Structured and unstructured document summarization:design of a commercial summarizer using Lexical chains , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10]  Zeynep Altan A TURKISH AUTOMATIC TEXT SUMMARIZATION SYSTEM , 2003 .

[11]  Andreas Paepcke,et al.  Accordion summarization for end-game browsing on PDAs and cellular phones , 2001, CHI.

[12]  Hasan Davulcu,et al.  Information Extraction from Web Pages Using Presentation Regularities and Domain Knowledge , 2007, World Wide Web.

[13]  Y. Wilks,et al.  A General Architecture for Text Engineering (gate) { a New Approach to Language Engineering R&d a General Architecture for Text Engineering (gate) | a New Approach to Language Engineering R&d a E G T , 1995 .

[14]  Shuming Shi,et al.  Web page title extraction and its application , 2007, Inf. Process. Manag..

[15]  Christopher C. Yang,et al.  Fractal summarization for mobile devices to access large documents on the web , 2003, WWW '03.

[16]  Fazli Can,et al.  Information retrieval on Turkish texts , 2008, J. Assoc. Inf. Sci. Technol..

[17]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[18]  F.C. Pembe,et al.  Heading-based sectional hierarchy identification for HTML documents , 2007, 2007 22nd international symposium on computer and information sciences.

[19]  Vagelis Hristidis,et al.  Structure-based query-specific document summarization , 2005, CIKM '05.

[20]  Sargur N. Srihari,et al.  Knowledge-based derivation of document logical structure , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[21]  Shao Fen Liang,et al.  Investigating sentence weighting components for automatic summarisation , 2007, Inf. Process. Manag..

[22]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.