Parallel Document Inversion using GPU

Recent advances in the technology of the Graphics Processing Unit (GPU) has led to a surge of interest in using the GPU for general purpose applications. We can utilize the GPU in computation as a massive parallel co-processor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Although the inverted index is a useful data structure that can be used for full text search or document retrieval, the large number of documents will require tremendous time to create the index. The performance of document inversion can be improved by multicore GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing.

[1]  Golden G. Richard,et al.  Massive threading: Using GPUs to increase the performance of digital forensics tools , 2007, Digit. Investig..

[2]  George Havas,et al.  Perfect Hashing , 1997, Theor. Comput. Sci..

[3]  Margo I. Seltzer,et al.  A New Hashing Package for UNIX , 1991, USENIX Winter.

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[6]  Mladen Berekovic,et al.  Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization , 2010, ICS '10.

[7]  Shane Ryoo,et al.  Program Optimization Strategies for Data-Parallel Many-Core Processors , 2008 .

[8]  Gordon W. Braudaway,et al.  Workload characterization and optimization of high-performance text indexing on the Cell Broadband Engine™ (Cell/B.E.) , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[9]  Hiroyuki Yamada,et al.  Scalable online index construction with multi-core CPUs , 2010, ADC.

[10]  Vijay K. Garg,et al.  Highly scalable algorithm for distributed real-time text indexing , 2009, 2009 International Conference on High Performance Computing (HiPC).

[11]  Mustapha Chérif-Eddine Yagoub,et al.  A novel approach for indexing Arabic documents through GPU computing , 2012, 2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[12]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[13]  Keith Bostic,et al.  Engineering Radix Sort , 1993, Comput. Syst..