Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions

We show that former approaches in probabilistic information retrieval are based on one or two of the three concepts abstraction, inductive learning, and probabilistic assumptions, and we propose a new approach which combines all three concepts. This approach is illustrated for the case of indexing with a controlled vocabulary. For this purpose, we describe a new probabilistic model first, which is then combined with logistic regression, thus yielding a generalization of the original model. Experimental results for the pure theoretical model as well as for heuristic variants are given. Furthermore, linear and logistic regression are compared.

[1]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[2]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[3]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[4]  Clement T. Yu,et al.  Non-binary independence model , 1986, SIGIR '86.

[5]  Ulrich Pfeifer,et al.  Entwicklung linear-iterativer und logistischer Indexierungsfunktionen , 1991, Information Retrieval.

[6]  Norbert Fuhr,et al.  Combining model-oriented and description-oriented approaches for probabilistic indexing , 1991, SIGIR '91.

[7]  Clement T. Yu,et al.  A framework for effective retrieval , 1989, ACM Trans. Database Syst..

[8]  Stephen Robertson,et al.  Statistical problems in the application of probabilistic models to information retrieval , 1982 .

[9]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[10]  Norbert Fuhr,et al.  The automatic indexing system AIR/PHYS - from research to applications , 1988, SIGIR '88.

[11]  William S. Cooper,et al.  Some inconsistencies and misnomers in probabilistic information retrieval , 1991, SIGIR '91.

[12]  Edward A. Fox,et al.  Coefficients of combining concept classes in a collection , 1988, SIGIR '88.

[13]  Norbert Fuhr,et al.  A probabilistic model of dictionary based automatic indexing , 1985, RIAO.

[14]  Norbert Fuhr,et al.  Models for retrieval with probabilistic indexing , 1989, Inf. Process. Manag..

[15]  Norbert Fuhr,et al.  AIR/X - A rule-based multistage indexing system for Iarge subject fields , 1991, RIAO.

[16]  J. D. H. Freeman Applied categorical data analysis , 1987 .

[17]  Clement T. Yu,et al.  Precision Weighting—An Effective Automatic Indexing Method , 1976, J. ACM.

[18]  Fredric C. Gey,et al.  Probabilistic retrieval based on staged logistic regression , 1992, SIGIR '92.

[19]  Stephen Robertson,et al.  Probabilistic Automatic Indexing by Learning from Human indexers , 1984, J. Documentation.

[20]  Norbert Fuhr,et al.  Retrieval Test Evaluation of a Rule Based Automatic Index (AIR/PHYS) , 1984, SIGIR.