On efficient posting list intersection with multicore processors

The size of indexable Web and the number of search queries submitted by users have been growing consistently throughout the past decade. With such a growth, efficient and scalable methods to implement information retrieval (IR) systems become critical for user satisfaction. Thus far the performance of IR with respect to query throughput and query latency has been improved by designing new list intersection algorithms [2] and by developing novel caching strategies [1]. In contrast to these techniques, we explore a new research direction to improve IR efficiency: designing algorithms that leverage modern computer architectures such as multicore systems. Multicores, primarily motivated by energy and power constraints, pack two or more cores on a single die. They typically share on-chip L2 cache as well as the front-side bus to main memory. As these systems become more popular, the general trend has been from single-core to many-core: from dual-, quad-, eight-core chips to the ones with tens of cores. So far, however, very little has been done to exploit the full potential of these chips in the context of IR. Strohman and Croft used 64-bit machines and four-core chips to show modest improvements in throughput [6]. Their techniques suffer from bandwidth issues, and as a result, provide only limited scalability. Bonacic et al. used synchronous strategies to group the queries into batches, and thereafter to process them sequentially [3]. Ding et al. parallelized the posting list intersections using graphics processors (GPUs) [4]. These techniques, however, fail to give good performance as the number of cores increases. In this article, we present and discuss two different parallel query processing models for multicore systems – inter-query parallelism and intra-query parallelism. While the former explores the parallelism between the queries, the latter exploits