An efficient PIM (Processor-In-Memory) architecture for BLAST

BLAST is a widely used tool to search for similarities in protein and DNA sequences. However, the kernels of BLAST are not efficiently supported by general-purpose processors because of the special computational requirements of the kernels. In this paper, we propose an efficient PIM (Processor-In-Memory) architecture to effectively execute the kernels of BLAST. We propose not only to reduce the memory latencies and increase the memory bandwidth but also to execute the operations inside the memory where the data are located. We also propose to execute the operations in parallel by dividing the memory into small segments and by having each of these segments executes operations concurrently. Our simulation results show that our computing paradigm provides a 242x performance improvement for the executions of the kernels and a 12x performance improvement for the overall execution of BLAST.

[1]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[2]  M. Sternberg Protein Structure Prediction: A Practical Approach , 1997 .

[3]  Jean-Luc Gaudiot,et al.  An efficient PIM (processor-in-memory) architecture for motion estimation , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[4]  Jaewook Shin,et al.  Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[5]  K. Yelick,et al.  Intelligent RAM (IRAM): chips that remember and compute , 1997, 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers.

[6]  Jean-Jacques Codani,et al.  LASSAP, a LArge Scale Sequence compArison Package , 1997, Comput. Appl. Biosci..

[7]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[8]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[11]  Dominique Lavenier SAMBA : Systolic Accelerator for Molecular Biological Applications , 1996 .

[12]  James R. Goodman,et al.  The declining effectiveness of dynamic caching for general- purpose microprocessors , 1995 .

[13]  Seung-Moon Yoo,et al.  FlexRAM: toward an advanced intelligent memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[14]  Kiyoo Itoh,et al.  Limitations and challenges of multigigabit DRAM chip design , 1997, IEEE J. Solid State Circuits.

[15]  James R. Goodman,et al.  Limited bandwidth to affect processor design , 1997, IEEE Micro.

[16]  Dominique Lavenier,et al.  Parallel Processing for Scanning Genomic Data-Bases , 1997, PARCO.

[17]  James R. Goodman,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).