AIR/X - A rule-based multistage indexing system for Iarge subject fields

AIR/X is a rule-based system for indexing with terms (descriptors) from a prescribed vocabulary. For this task, an indexing dictionary with rules for mapping terms from the text onto descriptors is required, which can be derived automatically from a set of manually indexed documents. Based on the Darmstadt Indexing Approach, the indexing task is divided into a description step and a decision step. First, terms (single words or phrases) are identiied in the document text. With term-descriptor rules from the dictionary, descriptor indications are formed. The set of all indications from a document leading to the same descriptor is called a relevance description. A probabilistic classiication procedure computes indexing weights for each relevance description. Since the whole system is rule-based, it can be adapted to diierent subject elds by appropriate modiications of the rule bases. A major application of AIR/X is the AIR/PHYS system developed for a large physics database. This application is described in more detail along with experimental results.

[1]  Gerhard Lustig,et al.  The EURATOM automatic indexing project , 1968, IFIP Congress.

[2]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[3]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[4]  Rainer Kuhlen,et al.  Experimentelle Morphologie in der Informationswissenschaft , 1977 .

[5]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[6]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[7]  G. Salton,et al.  A Generalized Term Dependence Model in Information Retrieval , 1983 .

[8]  Gerhard Knorz,et al.  Automatisches Indexieren als Erkennen abstrakter Objekte , 1983 .

[9]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[10]  Norbert Fuhr,et al.  Retrieval Test Evaluation of a Rule Based Automatic Index (AIR/PHYS) , 1984, SIGIR.

[11]  Stephen Robertson,et al.  Probabilistic Automatic Indexing by Learning from Human indexers , 1984, J. Documentation.

[12]  Norbert Fuhr,et al.  A probabilistic model of dictionary based automatic indexing , 1985, RIAO.

[13]  Gerard Salton,et al.  Another look at automatic text-retrieval systems , 1986, CACM.

[14]  Gerhard Lustig,et al.  Automatische Indexierung zwischen Forschung und Anwendung , 1986 .

[15]  Vijay V. Raghavan,et al.  On modeling of information retrieval concepts in vector spaces , 1987, TODS.

[16]  Andrew K. C. Wong,et al.  Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  J. D. H. Freeman Applied categorical data analysis , 1987 .

[18]  Clara Martinez,et al.  Expert system for machine-aided indexing , 1987, J. Chem. Inf. Comput. Sci..

[19]  Susanne M. Humphrey Illustrated description of an interactive knowledge based indexing system , 1987, SIGIR '87.

[20]  Norbert Fuhr,et al.  The automatic indexing system AIR/PHYS - from research to applications , 1988, SIGIR '88.

[21]  Norbert Fuhr,et al.  Optimum probability estimation from empirical distributions , 1989, Inf. Process. Manag..

[22]  Hubert Hüther,et al.  On the interrelationship of dictionary size and completeness , 1989, SIGIR '90.

[23]  Chris Buckley,et al.  Probabilistic document indexing from relevance feedback data , 1989, SIGIR '90.

[24]  Hubert Hüther Wachstumsfunktionen in der automatischen Indexierung , 1989 .

[25]  Norbert Fuhr,et al.  Models for retrieval with probabilistic indexing , 1989, Inf. Process. Manag..

[26]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[27]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[28]  Stuart L. Crawford,et al.  An architecture for probabilistic concept-based information retrieval , 1989, SIGIR '90.