Comparison between Different Global Weighting Schemes

the goal in information retrieval is to enable users to automatically and accurately retrieve data relevant to their queries. One possible approach to this problem is to use the vector space model, which models documents and queries as vectors in the term space. The components of the vectors are determined by the term weighting scheme. This paper compared between a selected set from the available term weighting schemes to determine which weighting method is the best one to be used with Arabic data collections. Our results shows that the best method is the probabilistic inverse (IDFP) method; and we recommend using it as a global weighting method for Arabic data collections.

[1]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[2]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[3]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[4]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[5]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[6]  Sabah S. Al-Fedaghi,et al.  Morphological compression of Arabic text , 1990, Inf. Process. Manag..

[7]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[8]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[9]  Ismail Hmeidi,et al.  Design and implementation of automatic indexing for information retrieval with Arabic documents , 1997 .

[10]  Tamara G. Kolda,et al.  Limited-memory matrix methods with applications , 1997 .

[11]  J Allan,et al.  Readings in information retrieval. , 1998 .

[12]  E. Chisholm,et al.  New Term Weighting Formulas for the Vector Space Method in Information Retrieval , 1999 .

[13]  A. Roeck,et al.  Assessment of a Significant Arabic Corpus , 2001 .

[14]  Anju Vyas Print , 2003 .

[15]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[16]  Ahmed Abdelali,et al.  Building A Modern Standard Arabic Corpus , 2004 .

[17]  Oscar Castillo,et al.  Proceedings of the International MultiConference of Engineers and Computer Scientists 2007, IMECS 2007, March 21-23, 2007, Hong Kong, China , 2007, IMECS.