Term clustering of syntactic phrases

Term clustering and syntactic phrase formation are methods for transforming natural language text. Both have had only mixed success as strategies for improving the quality of text representations for document retrieval. Since the strengths of these methods are complementary, we have explored combining them to produce superior representations. In this paper we discuss our implementation of a syntactic phrase generator, as well as our preliminary experiments with producing phrase clusters. These experiments show small improvements in retrieval effectiveness resulting from the use of phrase clusters, but it is clear that corpora much larger than standard information retrieval test collections will be required to thoroughly evaluate the use of this technique.

[1]  W. Bruce Croft Clustering large files of documents using the single-link method , 1977, J. Am. Soc. Inf. Sci..

[2]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[3]  Michael Lesk,et al.  Word-word associations in document retrieval systems , 1969 .

[4]  K. Sparck Jones,et al.  What makes an automatic keyword classification effective , 1971 .

[5]  Lynn A. Streeter,et al.  Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval , 1989, Inf. Process. Manag..

[6]  Philip J. Hayes,et al.  A News Story Categorization System , 1988, ANLP.

[7]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[8]  Donald J. Hillman,et al.  The Leader Retrieval System , 1899 .

[9]  Gerard Salton,et al.  On the application of syntactic methodologies in automatic text analysis , 1989, SIGIR '89.

[10]  Jack Minker,et al.  An evaluation of query expansion by the addition of clustered terms for a document retrieval system , 1972, Inf. Storage Retr..

[11]  W. Bruce Croft,et al.  Experiments with query acquisition and use in document retrieval systems , 1989, SIGIR '90.

[12]  Martin Dillon,et al.  FASIT: A fully automatic syntactically based indexing system , 1983, J. Am. Soc. Inf. Sci..

[13]  W. Bruce Croft,et al.  Language‐oriented information retrieval , 1989, Int. J. Intell. Syst..

[14]  Joel L. Fagan,et al.  The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval , 1989, JASIS.

[15]  Alan F. Smeaton,et al.  Experiments on incorporating syntactic processing of user queries into a document retrieval strategy , 1988, SIGIR '88.

[16]  J. Kittler Feature selection and extraction , 1978 .

[17]  Georges Antoniadis,et al.  A french text recognition model for information retrieval system , 1988, SIGIR '88.

[18]  Edward A. Fox,et al.  Coefficients of combining concept classes in a collection , 1988, SIGIR '88.

[19]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.

[20]  Joel L. Fagan The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval , 1989 .

[21]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[22]  Branimir Boguraev,et al.  Large Lexicons for Natural Language Processing: Utilising the Grammar Coding System of LDOCE , 1987, CL.

[23]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[24]  Paul H. Klingbiel Machine-aided indexing of technical literature , 1973, Inf. Storage Retr..

[25]  Karen Spärck Jones Collection properties influencing automatic term classification performance , 1973, Inf. Storage Retr..

[26]  W. Bruce Croft,et al.  Word sense disambiguation using machine-readable dictionaries , 1989, SIGIR '89.

[27]  Vijay V. Raghavan,et al.  Single-pass method for determining the semantic relationships between terms , 1977, J. Am. Soc. Inf. Sci..

[28]  Gregor Thurmair A common architecture for different text processing techniques in an information retrieval environment , 1986, SIGIR '86.

[29]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[30]  Ralph Grishman,et al.  Grammatically-based automatic word class formation , 1975, Inf. Process. Manag..

[31]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[32]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[33]  Carolyn J. Crouch,et al.  A cluster-based approach to thesaurus construction , 1988, SIGIR '88.

[34]  Peter Willett A fast procedure for the calculation of similarity coefficients in automatic classification , 1981, Inf. Process. Manag..

[35]  Tengku Mohd Tengku Sembok Logical-linguistic model and experiments in document retrieval , 1989 .