Finding relevant passages using noun-noun compounds: Coherence vs. proximity

Intuitively, words forming phrases are a more precise description of content than words as a sequence of keywords. Yet, evidence that phrases would be more effective for information retrieval is inconclusive. This paper isolates a neglected class of phrases, that is abundant in eommuuication, has an established theoretical foundation, and shows promise for an effective expression of the user's information need: the noun-noun compound (NNC). In an experiment, a variety of meaningful NNCs were used to isolate relevant passages in a large and varied corpus. In a first pass, passages were retrieved based on textual proximity of the words or their semantic peers. A second pass retained only passages containing a syntactically coherent structure equivalent to the original NNC. This second pass showed a dramatic increase in precision. Preliminary resuits show the validity of our intuition about phrases in the special but very productive case of NNCs.