Biobanks: the need for standardization

Biobanks are heterogeneous in their design and use, and they range in size from, say, 1,000 patients to 500,000 or more volunteers. They may contain data and samples from family studies, or from patients with a specific disease (plus ideally, matched controls), or they may be part of large-scale epidemiologic collections, or collections from clinical trials of new medical interventions. The samples collected will typically include whole blood and its fractions, extracted genomic DNA, whole cell RNA, urine, as well as, variously, saliva, nail clippings, hair and a variety of other tissues and material relevant to the design of specific studies. Inevitably, data and samples are collected under different conditions, to different standards and for different purposes. Some biobanks take a highly centralized approach to the collection, processing and archiving of samples (for example, UK Biobank [1]) where participant samples undergo minimal processing at the collection site, but are shipped to a central processing and storage facility. While ensuring robust quality control and data integrity and security, this approach inevitably introduces a delay between collection and cryopreservation that may result in the loss of labile species in the samples. Conversely, other large studies will aim to collect and process participant samples as quickly as possible (for example, the American Cancer Society Cancer Prevention Study-3 [2]). Here, samples are collected at fundraising events and in workplace settings and are processed within a few hours by local laboratories before low-temperature archiving. The challenges here are to maintain consistency of collection, shipping and processing. A hybrid approach is taken in other studies where a proportion of the participant samples are processed and stored locally, with a second set stored in a centralized archive. Here the challenges lie in process consistency, inventory control, and manage ment of the use of the depletable aspects of the resource. This method is being considered for the Helmholtz consortium Biobank, which is under development in Germany. Not surprisingly, given the challenges of data collection and sample storage within particular studies, there has been little standardization across biobanks. However, a number of international initiatives are aiming to provide guidance and protocols to address this issue going forward (for example, the DataSHaPER tools developed by the Public Population Project in Genomics (P3G) [3]). The aim is to facilitate data sharing between different resources, thereby increasing effective sample size and statistical power, especially for rare diseases [4]. Rather than striving for uniformity across diverse studies, we believe it is more realistic to focus on developing and testing protocols that produce high-quality data and samples, with full information describing their collection and processing. In this way, studies will be optimized for the specific questions being investigated, while also potentially contributing to collaborative efforts that take advantage of samples from several biobanks.