Application of Text Summarization techniques to the Geographical Information Retrieval task

Automatic Text Summarization has been shown to be useful for Natural Language Processing tasks such as Question Answering or Text Classification and other related fields of computer science such as Information Retrieval. Since Geographical Information Retrieval can be considered as an extension of the Information Retrieval field, the generation of summaries could be integrated into these systems by acting as an intermediate stage, with the purpose of reducing the document length. In this manner, the access time for information searching will be improved, while at the same time relevant documents will be also retrieved. Therefore, in this paper we propose the generation of two types of summaries (generic and geographical) applying several compression rates in order to evaluate their effectiveness in the Geographical Information Retrieval task. The evaluation has been carried out using GeoCLEF as evaluation framework and following an Information Retrieval perspective without considering the geo-reranking phase commonly used in these systems. Although single-document summarization has not performed well in general, the slight improvements obtained for some types of the proposed summaries, particularly for those based on geographical information, made us believe that the integration of Text Summarization with Geographical Information Retrieval may be beneficial, and consequently, the experimental set-up developed in this research work serves as a basis for further investigations in this field.

[1]  Ruslan Mitkov,et al.  The Oxford handbook of computational linguistics , 2003 .

[2]  Paul R. Cohen,et al.  Empirical methods for artificial intelligence , 1995, IEEE Expert.

[3]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[4]  Min-Yen Kan,et al.  Using librarian techniques in automatic text summarization for information retrieval , 2002, JCDL '02.

[5]  Guy Lapalme,et al.  Framework for Abstractive Summarization using Text-to-Text Generation , 2011, Monolingual@ACL.

[6]  Miguel A. García-Cumbreras,et al.  Using query reformulation and keywords in the geographic information retrieval task , 2008 .

[7]  Wai Lam,et al.  Learning to extract and summarize hot item features from multiple auction web sites , 2007, Knowledge and Information Systems.

[8]  Karel Jezek,et al.  Web Topic Summarization , 2008, ELPUB.

[9]  Óscar Ferrández Escámez Textual entailment recognition and its applicability in NLP tasks , 2010 .

[10]  Shang-Hsien Hsieh,et al.  A concept-based information retrieval approach for engineering domain-specific technical documents , 2012, Adv. Eng. Informatics.

[11]  Philip S. Yu,et al.  One-class learning and concept summarization for data streams , 2011, Knowledge and Information Systems.

[12]  Renata Vieira,et al.  Summarizing and referring: towards cohesive extracts , 2008, DocEng '08.

[13]  José M. Perea-Ortega,et al.  Comparing Several Textual Information Retrieval Systems for the Geographical Information Retrieval Task , 2008, NLDB 2008.

[14]  Eduard H. Hovy,et al.  Summarizing textual information about locations , 2011, COM.Geo.

[15]  Karen Spärck Jones,et al.  Generic summaries for indexing in information retrieval , 2001, SIGIR '01.

[16]  Victoria McCargar,et al.  Statistical Approaches to Automatic Text Summarization , 2005 .

[17]  S. Sameen Fatima,et al.  Extraction Based Automatic Text Summarization System with HMM Tagger , 2012 .

[18]  Dianne P. O'Leary,et al.  QCS: A system for querying, clustering and summarizing documents , 2007, Inf. Process. Manag..

[19]  Vijayan Sugumaran,et al.  Natural Language and Information Systems, 13th International Conference on Applications of Natural Language to Information Systems, NLDB 2008, London, UK, June 24-27, 2008, Proceedings , 2008, NLDB.

[20]  Elena Lloret Pastor Text summarisation based on human language technologies and its applications , 2011 .

[21]  Eduard H. Hovy,et al.  Summarizing Textual Information about Locations In a Geo-Spatial Information Display System , 2010, NAACL.

[22]  Fredric C. Gey,et al.  GeoCLEF: the CLEF 2005 Cross-Language Geographic Information Retrieval Track , 2005, CLEF.

[23]  K. Srinathan,et al.  Using Wikipedia Anchor Text and Weighted Clustering Coefficient to Enhance the Traditional Multi-document Summarization , 2012, CICLing.

[24]  Ray R. Larson,et al.  Geographic information retrieval and spatial browsing , 1996 .

[25]  Miguel Ángel García Cumbreras,et al.  Comparing Several Textual Information Retrieval Systems for the Geographical Information Retrieval Task , 2008, NLDB.

[26]  Tatsunori Mori,et al.  Multi-answer-focused multi-document summarization using a question-answering engine , 2004, COLING 2004.

[27]  Krzysztof A. Cyran,et al.  Advances in Intelligent and Soft Computing , 2009 .

[28]  Luis Alfonso Ureña López,et al.  Geo-NER: un reconocedor de entidades geográficas para inglés basado en GeoNames y Wikipedia , 2009, Proces. del Leng. Natural.

[29]  Sur-Jin Ker,et al.  A Text Categorization Based on a Summarization Extraction , 2000 .

[30]  Óscar Ferrández Textual Entailment Recognition and its Applicability in NLP Tasks , 2010, Proces. del Leng. Natural.

[31]  Christopher B. Jones,et al.  Geographical information retrieval , 2008, Int. J. Geogr. Inf. Sci..

[32]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[33]  Tatsunori Mori,et al.  Multi-Answer-Focused Multi-Document Summarization Using a Question-Answering Engine , 2004, COLING.

[34]  Jian-Ping Mei,et al.  SumCR: A new subtopic-based extractive approach for text summarization , 2012, Knowledge and Information Systems.

[35]  Constantin Orasan Comparative Evaluation of Term-Weighting Methods for Automatic Summarization* , 2009, J. Quant. Linguistics.

[36]  Dan Cristea,et al.  Summarisation Through Discourse Structure , 2005, CICLing.

[37]  Elena Lloret,et al.  Text summarization contribution to semantic question answering: New approaches for finding answers on the web , 2011, Int. J. Intell. Syst..