论文信息 - Towards a new summarization approach for search engine results: An application for Turkish

Towards a new summarization approach for search engine results: An application for Turkish

With the drastic increase of available information sources on the Internet, people with different backgrounds share the same problem: locating useful information for their actual needs. Search engines make this task easier only in certain ways; people still have to do the sifting process by themselves. At this point, automatic summarization can complement the task of search engines. In this paper, we consider a new summarization approach for Web information retrieval; i.e. structure-preserving and query-biased summarization. We evaluate this approach on Turkish Web documents using TREC-like topics defined for Turkish. The results of the task-based evaluation show that this approach has significant improvement over Google snippets and unstructured query-biased summaries in terms of f-measure using the relevance prediction approach.

F.C. Pembe | T. Gungor | F. C. Pembe | T. Gungor

[1] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[2] Mark Sanderson,et al. Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[3] Ryen W. White,et al. A task-oriented study on the influencing effects of query-biased summarisation in web searching , 2003, Inf. Process. Manag..

[4] H. P. Edmundson,et al. New Methods in Automatic Extracting , 1969, JACM.

[5] Hamish Cunningham,et al. GATE-a General Architecture for Text Engineering , 1996, COLING.

[6] Amanda Spink,et al. An analysis of Web searching by European AlltheWeb.com users , 2005, Inf. Process. Manag..

[7] Richard M. Schwartz,et al. Task-based evaluation of text summarization using Relevance Prediction , 2007, Inf. Process. Manag..

[8] Gail E. Kaiser,et al. Automating Content Extraction of HTML Documents , 2005, World Wide Web.

[9] Fuad Rahman,et al. Structured and unstructured document summarization:design of a commercial summarizer using Lexical chains , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10] Zeynep Altan. A TURKISH AUTOMATIC TEXT SUMMARIZATION SYSTEM , 2003 .

[11] Andreas Paepcke,et al. Accordion summarization for end-game browsing on PDAs and cellular phones , 2001, CHI.

[12] Hasan Davulcu,et al. Information Extraction from Web Pages Using Presentation Regularities and Domain Knowledge , 2007, World Wide Web.

[13] Y. Wilks,et al. A General Architecture for Text Engineering (gate) { a New Approach to Language Engineering R&d a General Architecture for Text Engineering (gate) | a New Approach to Language Engineering R&d a E G T , 1995 .

[14] Shuming Shi,et al. Web page title extraction and its application , 2007, Inf. Process. Manag..

[15] Christopher C. Yang,et al. Fractal summarization for mobile devices to access large documents on the web , 2003, WWW '03.

[16] Fazli Can,et al. Information retrieval on Turkish texts , 2008, J. Assoc. Inf. Sci. Technol..

[17] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[18] F.C. Pembe,et al. Heading-based sectional hierarchy identification for HTML documents , 2007, 2007 22nd international symposium on computer and information sciences.

[19] Vagelis Hristidis,et al. Structure-based query-specific document summarization , 2005, CIKM '05.

[20] Sargur N. Srihari,et al. Knowledge-based derivation of document logical structure , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[21] Shao Fen Liang,et al. Investigating sentence weighting components for automatic summarisation , 2007, Inf. Process. Manag..

[22] Margaret J. Robertson,et al. Design and Analysis of Experiments , 2006, Handbook of statistics.