Pseudo-frequency method (poster session): an efficient document ranking retrieval method for n-gram indexing

Although n-gram (n successive characters) indexing is widely used in retrieval systems for documents in Japanese and other Asian languages, it is difficult to process ranking retrieval efficiently using n-gram indexing. This is because frequency information for query words needs to be computed using indexed data since this information is not directly available from the n-gram index. To reduce processing costs, this paper proposes a pseudo-frequency method, which uses a word's estimated frequencies instead of precise ones. The results of experiments on NTCIR, a Japanese IR test collection, showed that the proposed method speeded up retrieval without degrading retrieval effectiveness.