Tsunami: massively parallel homomorphic hashing on many‐core GPUs

Homomorphic hash functions play a key role in securing distributed systems that use coding techniques such as erasure coding and network coding. The computational complexity of homomorphic hash functions remains a main challenge. In this paper, we present a massively parallel solution, named Tsunami, by exploiting the widely available many‐core graphic processing units (GPUs). Tsunami includes the following optimization techniques to achieve the highest ever hashing throughput: (1) using Montgomery multiplication and precomputation to speed up modular exponentiations; (2) using a clean implementation of Montgomery multiplication in order to decrease the demand of registers and shared memory and increase the utilization ratio of GPU processing cores; (3) using our own assembly code to implement the 32‐bit integer multiplication, which outperforms the assembly codes generated by the native compiler by 20%; and (4) exploiting memory alignment and constant memory on GPUs to improve the efficiency of memory access. Integrating the above techniques, our Tsunami achieves a significant improvement over existing results. Specifically, the hashing throughput achieved by Tsunami on a GTX295 GPU (NVIDIA, Santa Clara, CA, US) is about 33 times that of the existing solution on a quad‐core CPU. We also show that the hashing throughput grows almost linearly with the number of GPU cores. Copyright © 2011 John Wiley & Sons, Ltd.

[1]  R. Koetter,et al.  The benefits of coding over routing in a randomized setting , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[2]  Xiaowen Chu,et al.  Practical Random Linear Network Coding on GPUs , 2009, Networking.

[3]  Christos Gkantsidis,et al.  Cooperative Security for Network Coding File Distribution , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[4]  David Mazières,et al.  On-the-fly verification of rateless erasure codes for efficient content distribution , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.

[5]  Xiaowen Chu,et al.  Massively Parallel Network Coding on GPUs , 2008, 2008 IEEE International Performance, Computing and Communications Conference.

[6]  John C. S. Lui,et al.  On the Practical and Security Issues of Batch Content Distribution Via Network Coding , 2006, Proceedings of the 2006 IEEE International Conference on Network Protocols.

[7]  Matei Ripeanu,et al.  StoreGPU: exploiting graphics processing units to accelerate distributed storage systems , 2008, HPDC '08.

[8]  Baochun Li,et al.  R2: Random Push with Random Network Coding in Live Peer-to-Peer Streaming , 2007, IEEE Journal on Selected Areas in Communications.

[9]  Xiaowen Chu,et al.  Random linear network coding for peer-to-peer applications , 2010, IEEE Network.

[10]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[11]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[12]  Yong Guan,et al.  An Efficient Signature-Based Scheme for Securing Network Coding Against Pollution Attacks , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[13]  T. Ho,et al.  On Linear Network Coding , 2010 .

[14]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[15]  Bobby Bodenheimer,et al.  Synthesis and evaluation of linear motion transitions , 2008, TOGS.

[16]  Xin Wang,et al.  Nuclei: GPU-Accelerated Many-Core Network Coding , 2009, IEEE INFOCOM 2009.

[17]  Leonel Sousa,et al.  Massive parallel LDPC decoding on GPU , 2008, PPoPP.

[18]  Xiaowen Chu,et al.  Speeding Up Homomorpic Hashing Using GPUs , 2009, 2009 IEEE International Conference on Communications.

[19]  Tolga Acar,et al.  Analyzing and comparing Montgomery multiplication algorithms , 1996, IEEE Micro.

[20]  Baochun Li,et al.  Lava: A Reality Check of Network Coding in Peer-to-Peer Live Streaming , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[21]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[22]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[23]  Christos Gkantsidis,et al.  Network coding for large scale content distribution , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[24]  P. L. Montgomery Modular multiplication without trial division , 1985 .

[25]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .