COMPUTE PAIRWISE EUCLIDEAN DISTANCES OF DATA POINTS WITH GPUS

Graphics processing units (GPUs) are powerful computational devices tailored toward the needs of the 3-D gaming industry for high-performance, real-time graphics engines. Nvidia released a new generation of GPUs designed for general-purpose computing in 2006, and a GPU programming language called CUDA in 2007. The DNA microarray technology is a high throughput tool for assaying gene expression of cell cultures or tissue samples. During the exploratory phase of data analysis, scientists often apply (agglomerative)hierarchical clustering on the genes. In hierarchicalclustering, a fundamentaloperationis to calculate all pairwise distances among all genes. If there are n genes, it takes O(n 2 ) time. In the present study, we examine how to use GPUs and the CUDA language to speed up the calculation. The results achieve a 20 to 44 times speedup on the GPU compared to the CPU implementation.

[1]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[2]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  D. Botstein,et al.  Exploring the new world of the genome with DNA microarrays , 1999, Nature Genetics.

[5]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[6]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[7]  Qiong Zhang,et al.  Hierarchical clustering of gene expression profiles with graphics hardware acceleration , 2006, Pattern Recognit. Lett..

[8]  Weiguo Liu,et al.  Streaming Algorithms for Biological Sequence Alignment on GPUs , 2007, IEEE Transactions on Parallel and Distributed Systems.

[9]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.