Comparing the Effect of Syntactic vs. Statistical Phrase Indexing Strategies for Dutch

In this paper we describe the results of experiments contrasting syntactic phrase indexing with statistical phrase indexing for Dutch texts. Our results showed that we at least need a compound splitting algorithm for good quality retrieval for Dutch texts. If we then add either syntactic or statistical phrases, performance generally improves, but this effect is never statistically significant. If we compare syntactic vs. statistical phrase indexing, syntactic phrases are slightly superior to statistical phrases, particularly at high precision. At higher recall levels syntactic and statistical phrases are equally effective. However, since a compound splitting algorithm requires a dictionary and knowledge about constraints on compound formation, a purely non-linguistic indexing strategy, with or without phrases, does not seem to be very effective for Dutch.