Using Windmill Expansion for Document Retrieval

SEMIOTIKS aims to utilise online information to support the crucial decision–making of those military and civilian agencies involved in the humanitarian removal of landmines in areas of conflict throughout the world. An analysis of the type of information required for such a task has given rise to four main areas of research: information retrieval, document annotation, summarisation and visualisation. The first stage of the research has focused on information retrieval, and a new algorithm, “Windmill Expansion” (WE) has been proposed to do this. The algorithm uses retrieval feedback techniques for automated query expansion in order to improve the effectiveness of information retrieval. WE is based on the extraction of human–generated written phases for automated query expansion. Top and Second Level expansion terms have been generated and their usefulness evaluated. The evaluation has concentrated on measuring the degree of overlap between the retrieved URLs. The less the overlap, the more useful the information provided. The Top Level expansion terms were found to provide 90% of useful URLs, and the Second Level 83% of useful URLs. Although there was a decline of useful URLs from the Top Level to the Second Level, the quantity of relevant information retrieved has increased. The originality of SEMIOTIKS lies in its use of the WE algorithm to help non–domain specific experts automatically explore domain words for relevant and precise information retrieval.

[1]  Eitan Farchi,et al.  Automatic query wefinement using lexical affinities with maximal information gain , 2002, SIGIR '02.

[2]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[3]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[4]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[5]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[6]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[7]  Ian Ruthven,et al.  Re-examining the potential effectiveness of interactive query expansion , 2003, SIGIR.

[8]  Vijay V. Raghavan,et al.  On Modeling of Concept Based Retrieval in Generalized Vector Spaces , 2000, ISMIS.

[9]  Claudio Carpineto,et al.  Improving retrieval feedback with multiple term-ranking function combination , 2002, TOIS.

[10]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[11]  James W. Cooper,et al.  Lexical navigation: visually prompted query expansion and refinement , 1997, DL '97.

[12]  Min Song,et al.  Keyphrase extraction-based query expansion in digital libraries , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[13]  Michael D. Gordon User-based document clustering by redescribing subject descriptions with a genetic algorithm , 1991, J. Am. Soc. Inf. Sci..

[14]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.