Inverted files are widely used to index documents in large-scale information retrieval systems. An inverted file consists of posting lists, which can be stored in either a document-identifier ascending order or a document-weight descending order. For an identifierascending-order posting list, retrieving ranked documents necessitates traversal of all postings, whereas for the weight-descending-order posting list, performing Boolean queries involves very complex processing. In this paper, we transform a posting list to a tree-based structure, called the n-key-heap posting tree, to speedup ranked-document retrieval for Boolean queries. In this structure, the orders of document identifiers and document weights are preserved simultaneously. To preserve the identifier order, the edge pointers are designed to maintain numerical order in the posting tree. To preserve the weight order, greater-weight postings are stored in higher tree nodes by the heap property. We model these criteria to a tree-construction problem and propose an efficient algorithm to construct an optimal posting tree having the minimal access time.
[1]
Kenneth Steiglitz,et al.
Combinatorial Optimization: Algorithms and Complexity
,
1981
.
[2]
Kotagiri Ramamohanarao,et al.
Inverted files versus signature files for text indexing
,
1998,
TODS.
[3]
Ian H. Witten,et al.
Managing Gigabytes: Compressing and Indexing Documents and Images
,
1999
.
[4]
Li Fan,et al.
Web caching and Zipf-like distributions: evidence and implications
,
1999,
IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).