CGHub: Kick-starting the Worldwide Genome Web

The University of California, Santa Cruz (UCSC) is under contract with the National Cancer Institute (NCI) to construct and operate the Cancer Genomics Hub (CGHub), a nation-scale library and user portal for cancer genomics data.  This contract covers growth of the library to 5 Petabytes. The NCI programs that feed into the library currently produce about 20 terabytes of data each month. We discuss the receiver-driven file transfer mechanism Annai GeneTorrent (GT) for use with the library. Annai GT uses multiple TCP streams from multiple computers at the library site to parallelize genome downloads.  We review our performance experience with the new transfer mechanism and also explain additions to the transfer protocol to support the security required in handling patient cancer genomics data.

[1]  Di Wu,et al.  Unraveling the BitTorrent Ecosystem , 2011, IEEE Transactions on Parallel and Distributed Systems.

[2]  Elias Campo Guerri,et al.  International network of cancer genome projects , 2010 .

[3]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.