Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval

This paper presents implementations of generative management method for morphological variation of query keywords. The method is called FCG, Frequent Case Generation. It is based on the skewed distributions of word forms in natural languages and is suitable for languages that either have fair amount of morphological variation or are morphologically very rich. The paper reports implementation and evaluation of automatic procedures of variant query keyword form generation with short and long queries of CLEF collections for English, Finnish, German and Swedish. The evaluated languages show varying degrees of morphological complexity.

[1]  Alfred V. Aho,et al.  Data Structures and Algorithms , 1983 .

[2]  Harvey J. Greenberg Algorithms and Heuristics , 1993 .

[3]  Stephen E. Robertson,et al.  Salton Award Lecture on theoretical argument in information retrieval , 2000, SIGF.

[4]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[5]  Eero Sormunen,et al.  A Method for Measuring Wide Range Performance of Boolean Queries in Full-Text Databases , 2000 .

[6]  Eija Airio Word normalization and decompounding in mono- and bilingual IR , 2006, Information Retrieval.

[7]  Hugo Zaragoza,et al.  Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[8]  Jacques Savoy,et al.  Searching strategies for the Bulgarian language , 2007, Information Retrieval.

[9]  Kimmo Kettunen,et al.  Reductive and Generative Approaches to Morphological Variation of Keywords in Monolingual Information Retrieval , 2007 .

[10]  T. Obremski Practical Nonparametric Statistics (2nd ed.) , 1981 .

[11]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[12]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[13]  Kalervo Järvelin,et al.  Restricted inflectional form generation in management of morphological keyword variation , 2007, Information Retrieval.

[14]  Ola Knutsson,et al.  Designing and developing a language environment for second language writers , 2007, Comput. Educ..

[15]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[16]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[17]  Kimmo Kettunen,et al.  Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval? , 2006, FinTAL.

[18]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[19]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval) , 2004 .

[20]  D. Whitefield,et al.  A review of: “Practical Nonpararnetric Statistics. By W. J. CONOVER. (New York: Wiley, 1971.) [Pl" x+462.] £5·25. , 1972 .

[21]  M. de Rijke,et al.  Monolingual Document Retrieval for European Languages , 2004, Information Retrieval.

[22]  Edie M. Rasmussen,et al.  Indexing and retrieval for the Web , 2005, Annu. Rev. Inf. Sci. Technol..