A Comparison Between Statistically and Sytactically Generated Term Phrases

It is customary to use single terms (words) or terms in context (phrases) as indexing units for the representation of natural-language text content. There is evidence that term phrases may provide some advantages over the use of single terms for text content representation. This note presents an evaluation of the expected usefulness of automatic term phrase generation systems involving syntactic processing compared with methods based only on the statistical co-occurance characteristics between individual text words.