The Application of Morpho-Syntactic Language Processing to Effective Phrase Matching

Abstract The application of automatic natural language processing techniques to the indexing and the retrieval of text information has been a target of information retrieval researchers for some time. Incorporating semantic-level processing of language into retrieval has led to conceptual information retrieval, which is effective but usually restricted in its domain. Using syntactic-level analysis is domain-independent, but has not yet yielded significant improvements in retrieval quality. This paper describes a process whereby a morpho-syntactic analysis of phrases or user queries is used to generate a structured representation of text. A process of matching these structured representations is then described that generates a metric value or score indicating the degree of match between phrases. This scoring can then be used for ranking the phrases. In order to evaluate the effectiveness or quality of the matching and scoring of phrases, some experiments are described that indicate the method to be quite useful. Ultimately the phrasematching technique described here would be used as part of an overall document retrieval strategy, and some future work towards this direction is outlined.

[1]  Ian Sommerville,et al.  An information retrieval system for software components , 1988, Softw. Eng. J..

[2]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[3]  Stephanie W. Haas,et al.  Constituent object parsing for information retrieval and similar text processing problems , 1989, JASIS.

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[6]  Xin Lu Document retrieval: A structural approach , 1990, Inf. Process. Manag..

[7]  Gerald Salton,et al.  Automatic text processing , 1988 .

[8]  W. Bruce Croft,et al.  Term clustering of syntactic phrases , 1989, SIGIR '90.

[9]  Stephanie W. Haas,et al.  Conjunction, ellipsis, and other discontinuous constituents in the constituent object parser , 1990, Inf. Process. Manag..

[10]  Fred Karlsson,et al.  Constraint Grammar as a Framework for Parsing Running Text , 1990, COLING.

[11]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[12]  Christoph Schwarz Content based text handling , 1990, Inf. Process. Manag..

[13]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[14]  Richard M. Tong,et al.  Conceptual information retrieval using RUBRIC , 1987, SIGIR '87.

[15]  Gerard Salton,et al.  On the application of syntactic methodologies in automatic text analysis , 1989, SIGIR '89.

[16]  Martin Dillon,et al.  FASIT: A fully automatic syntactically based indexing system , 1983, J. Am. Soc. Inf. Sci..

[17]  Ramanathan V. Guha,et al.  Cyc: toward programs with common sense , 1990, CACM.

[18]  Giovanni Guida,et al.  IR-NLI II: applying man-machine interaction and artificial intelligence conceptsto information retrieval , 1988, SIGIR '88.

[19]  Ron Sacks-Davis,et al.  Using syntactic analysis in a document retrieval system that uses signature files , 1989, SIGIR '90.