Coinami: A Cryptocurrency with DNA Sequence Alignment as Proof-of-work

Rate of growth of the amount of data generated using the high throughput sequencing (HTS) platforms now exceeds the growth stipulated by Moore’s Law. The HTS data is expected to surpass those of other \big data" domains such as astronomy, before the year 2025. In addition to sequencing genomes for research purposes, genome and exome sequencing in clinical settings will be a routine part of health care. The analysis of such large amounts of data, however, is not without computational challenges. This burden is even more increased due to the periodic updates to reference genomes, which typically require re-analysis of existing data. Here we propose Coin-Application Mediator Interface (Coinami 1 ) to distribute the workload for mapping reads to reference genomes using a volunteer grid computer approach similar to Berkeley Open Infrastructure for Network Computing (BOINC). However, since HTS read mapping requires substantial computational resources and fast analysis turnout is desired, Coinami uses the HTS read mapping as proof-of-work to generate valid blocks to main its own cryptocurrency system, which may help motivate volunteers to dedicate more resources. The Coinami protocol includes mechanisms to ensure that jobs performed by volunteers are correct, and provides genomic data privacy. The prototype implementation of Coinami is available at http://coinami.github.io/.

[1]  Joan Daemen,et al.  AES Proposal : Rijndael , 1998 .

[2]  Jan O. Korbel,et al.  Data analysis: Create a cloud commons , 2015, Nature.

[3]  Faraz Hach,et al.  SCALCE: boosting sequence compression algorithms using locally consistent encoding , 2012, Bioinform..

[4]  Alexander F. Wilson,et al.  Research in Genomic Medicine the Clinseq Project: Piloting Large-scale Genome Sequencing for Material Supplemental , 2009 .

[5]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[6]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[7]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[8]  Faraz Hach,et al.  DeeZ: reference-based compression by local assembly , 2014, Nature Methods.

[9]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[10]  Nuno A. Fonseca,et al.  Tools for mapping high-throughput sequencing data , 2012, Bioinform..

[11]  David Haussler,et al.  Building a Pan-Genome Reference for a Population , 2015, J. Comput. Biol..

[12]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[13]  A global reference for human genetic variation , 2015, Nature.

[14]  P. Flicek,et al.  The need for speed , 2009, Genome Biology.

[15]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[16]  Robert D. Finn,et al.  The European Bioinformatics Institute in 2016: Data growth and integration , 2015, Nucleic Acids Res..

[17]  Faraz Hach,et al.  mrsFAST: a cache-oblivious algorithm for short-read mapping , 2010, Nature Methods.

[18]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[19]  Adam Back,et al.  Hashcash - A Denial of Service Counter-Measure , 2002 .

[20]  John Tromp,et al.  Cuckoo Cycle: A Memory Bound Graph-Theoretic Proof-of-Work , 2015, Financial Cryptography Workshops.

[21]  Sunny King,et al.  PPCoin: Peer-to-Peer Crypto-Currency with Proof-of-Stake , 2012 .

[22]  Robert L. Grossman,et al.  UDT: UDP-based data transfer for high-speed wide area networks , 2007, Comput. Networks.

[23]  Onur Mutlu,et al.  Accelerating read mapping with FastHASH , 2013, BMC Genomics.

[24]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[25]  Raymond K. Auerbach,et al.  The real cost of sequencing: higher than you think! , 2011, Genome Biology.

[26]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[27]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[28]  Markus Hsi-Yang Fritz,et al.  Efficient storage of high throughput DNA sequencing data using reference-based compression. , 2011, Genome research.