Research on Hybrid Index for Chinese IR

It is essential to identify terms that are used as index units in the processing of Chinese documents and queries in IR. In this paper new kinds of hybrid index are proposed, which combine words and bigrams. This kind of hybrid index can reduce the impact of out-of-vocabulary and segmentation ambiguity for Chinese IR, because the dictionary is applied to detect segmentation ambiguities in a flexible way rather than by the ambiguity table rigidly. The experiments show the new kind of hybrid index is not only comparable with bigrams indexing, but also enhances the retrieval efficiency.