Constructing a Mobility and Acceleration Computing Platform with NVIDIA Jetson TK1

Current high-end graphics processing units (GPUs), which contain up to thousand cores per-chip, are widely used in the high performance computing community. However, in the past, the cost and power consumption of constructing a high performance platform with graphics cards, such as Tesla and Fermi series, are high. Moreover, these graphics cards all installed in personal computers or servers, and then the immediate and mobility requirements can not be provided by this platform. NVIDIA Jetson TK1 (Tegra K1) is a full-featured platform for embedded applications and it contains 192 CUDA Cores (Kepler GPU). Due to its low cost, low power consumption and high applicability, NVIDIA Jetson TK1 has become a new research direction. In this paper, we construct a mobility and acceleration computing platform with NVIDIA Jetson TK1. Besides, two tools, ClustalWtk and MCCtk are designed based on NVIDIA Jetson TK1. These tools both can achieve 3 and 4 times speedup ratios on single NVIDIA Jetson TK1 by comparing with their CPU versions on Intel XEON E5-2650 CPU and ARM Cortex-A15 CPU, respectively. Moreover, the cost-performance ratio by NVIDIA Jetson TK1 is higher than that by NVIDIA Tesla K20m. In addition, the user friendly interfaces are also provided by these two tools, respectively.

[1]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[2]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[3]  Che-Lun Hung,et al.  Efficient parallel algorithm for compound comparisons on multi-GPUs , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[4]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[5]  Pier Stanislao Paolucci,et al.  Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores , 2015, ArXiv.

[6]  Roger A. Sayle,et al.  Lingos, Finite State Machines, and Fast Similarity Searching , 2006, J. Chem. Inf. Model..

[7]  James Wolfer A heterogeneous supercomputer model for high-performance parallel computing pedagogy , 2015, 2015 IEEE Global Engineering Education Conference (EDUCON).

[8]  Sidi Fu,et al.  Micromagnetics on high-performance workstation and mobile computational platforms , 2015 .

[9]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[10]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[11]  Che-Lun Hung,et al.  CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs , 2015, Comput. Biol. Chem..

[12]  Yongchao Liu,et al.  CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units , 2010, Pattern Recognit. Lett..

[13]  John H. Van Drie,et al.  Computer-aided drug design: the next 20 years , 2007, J. Comput. Aided Mol. Des..

[14]  Weiguo Liu,et al.  CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  David Vidal,et al.  LINGO, an Efficient Holographic Text Based Method To Calculate Biophysical Properties and Intermolecular Similarities , 2005, J. Chem. Inf. Model..