DNA.Land: A digital biobank using a massive crowdsourcing approach

Precision medicine necessitates large scale collections of genomes and phenomes. Despite decreases in the costs of genomic technologies, collecting these types of information at scale is still a daunting task that poses logistical challenges and requires consortium-scale resources. Here, we describe DNA.Land, a digital biobank to collect genome and phenomes with a fraction of the resources of traditional studies at the same scale. Our approach relies on crowd-sourcing data from the rapidly growing number of individuals that have access to their own genomic datasets through Direct-to-Consumer (DTC) companies. To recruit participants, we developed a series of automatic return-of-results features in DNA.Land that increase users’ engagement while stratifying human subject research protection. So far, DNA.Land has collected over 43,000 genomes in 20 months of operation, orders of magnitude higher than previous digital attempts by academic groups. We report lessons learned in running a digital biobank, our technical framework, and our approach regarding ethical, legal, and social implications.

[1]  John Wilbanks,et al.  First, design for data sharing , 2016, Nature Biotechnology.

[2]  P. Gonzalez-Alegre,et al.  Towards precision medicine , 2017 .

[3]  P. Bayer,et al.  openSNP–A Crowdsourced Web Resource for Personal Genomics , 2014, PloS one.

[4]  Brian W. Powers,et al.  The digital phenotype , 2015, Nature Biotechnology.

[5]  David Mittelman,et al.  Rumors of the death of consumer genomics are greatly exaggerated , 2013, Genome Biology.

[6]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[7]  G. Church,et al.  From genetic privacy to open consent , 2008, Nature Reviews Genetics.

[8]  D. Ledbetter,et al.  Toward clinical genomics in everyday medicine: perspectives and recommendations , 2016, Expert review of molecular diagnostics.

[9]  T. Peakman,et al.  Design and implementation of a high-throughput biological sample processing facility using modern manufacturing principles. , 2008, International journal of epidemiology.

[10]  Ryen W. White,et al.  Screening for Pancreatic Adenocarcinoma Using Signals From Web Search Logs: Feasibility Study and Results. , 2016, Journal of oncology practice.

[11]  Yaniv Erlich,et al.  Quantitative analysis of population-scale family trees using millions of relatives , 2017, bioRxiv.

[12]  David Reich,et al.  The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States , 2015, American journal of human genetics.

[13]  Patricio S La Rosa,et al.  Biogeography of the ecosystems of the healthy human body , 2013, Genome Biology.

[14]  Yannis Bakos,et al.  Does Anyone Read the Fine Print? Consumer Attention to Standard-Form Contracts , 2014, The Journal of Legal Studies.

[15]  Frédo Durand,et al.  Eulerian video magnification for revealing subtle changes in the world , 2012, ACM Trans. Graph..

[16]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[17]  Euan A Ashley,et al.  A public resource facilitating clinical use of genomes , 2012, Proceedings of the National Academy of Sciences.

[18]  Nita A. Farahany,et al.  Redefining Genomic Privacy: Trust and Empowerment , 2014, bioRxiv.

[19]  Yaniv Erlich,et al.  DNA Compass: a secure, client-side site for navigating personal genetic information , 2016, bioRxiv.