Language and Retrieval Accuracy

One of the major challenges in the Information Retrieval field is handling the massive amount of information available to Internet users. Existing ranking techniques and strategies that govern the retrieval process fall short of expected accuracy. Often relevant documents are buried deep in the list of documents returned by the search engine. In order to improve retrieval accuracy we examine the issue of language effect on the retrieval process. Then, we propose a solution for a more biased, user-centric relevance for retrieved data. The results demonstrate that using indices based on variations of the same language enhances the accuracy of search engines for individual users. Keywords—Information Search and Retrieval, Language Variants, Search Engine, Retrieval Accuracy.

[1]  Paul McNamee,et al.  Language identification: a solved problem suitable for undergraduate instruction , 2005 .

[2]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[3]  W. Bruce Croft,et al.  A framework for selective query expansion , 2004, CIKM '04.

[4]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[5]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[6]  Michael D. Gordon,et al.  Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines , 1999, Inf. Process. Manag..

[7]  L. Azzopardi,et al.  Topic based language models for ad hoc information retrieval , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[8]  Ράνια Σιάτρη,et al.  Information seeking in electronic environment: a comparative investigation among computer scientists in British and Greek universities , 1999 .

[9]  Ron Zacharski,et al.  Language Recognition for Mono-and Multi-lingual Documents , 1999 .

[10]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[11]  W W Chang,et al.  Chinese dialect identification using segmental and prosodic features. , 2000, The Journal of the Acoustical Society of America.

[12]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Ted E. Dunning,et al.  Statistical Identification of Language , 1994 .

[15]  C. J. van Rijsbergen,et al.  Investigating the relationship between language model perplexity and IR precision-recall measures , 2003, SIGIR.

[16]  Philip Clarkson,et al.  Towards improved language model evaluation measures , 1999, EUROSPEECH.

[17]  A. House,et al.  Toward automatic identification of the language of an utterance. I. Preliminary methodological con , 1977 .

[18]  Ahmed Abdelali Localization in Modern Standard Arabic , 2004, J. Assoc. Inf. Sci. Technol..

[19]  Ralph Grishman,et al.  The American National Corpus: A Standardized Resource for American English , 2000, LREC.

[20]  Panayiotis G. Georgiou,et al.  Building topic specific language models from webdata using competitive models , 2005, INTERSPEECH.

[21]  Amanda Spink,et al.  A user-centered approach to evaluating human interaction with Web search engines: an exploratory study , 2002, Inf. Process. Manag..

[22]  Peter Vojtás,et al.  UPRE: User Preference Based Search System , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[23]  Giles,et al.  Searching the world wide Web , 1998, Science.

[24]  Graeme D. Kennedy,et al.  Book Reviews: An Introduction to Corpus Linguistics , 1999, CL.

[25]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[26]  W. Idsardi,et al.  Perceptual and Phonetic Experiments on American English Dialect Identification , 1999 .

[27]  Teuvo Kohonen,et al.  Self-Organizing Maps, Second Edition , 1997, Springer Series in Information Sciences.

[28]  Andrew W. Moore,et al.  K-means and Hierarchical Clustering , 2004 .

[29]  Ahmed Abdelali,et al.  Language variation as a context for information retrieval , 2005 .

[30]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[31]  Douglas A. Reynolds,et al.  Dialect identification using Gaussian mixture models , 2004, Odyssey.

[32]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.