BBMRI-ERIC Directory: 515 Biobanks with Over 60 Million Biological Samples

B iobanks 1 are well-organized repositories of biological material. They have become the fundamental resource for advancing medical research and constitute a major component of more generally understood bioresources. Yet they face a number of challenges to become more utilized on the national and global scale. These challenges range from fragmentation of data structure and sometimes even lack of availability of data, lack of consistent quality management and traceability to fragmentation of privacy protection regulations and technical, organizational, and legal aspects of scalable secure storage and processing of privacy-sensitive big data. To address the fragmentation and findability aspects, BBMRIERIC has released its Directory as a first IT service, providing aggregate information about the biobanks and bioresources. The Directory features a novel scalable distributed architecture, which enables updating data about changing resources in a long-term sustainable manner. Inventory data about the bioresources, describing availability of various resource types such as biological material, data, expertise, and offered services, are the basis for any further interaction between the biobanks as resource/service providers and their users or collaborators. There have been various terms used for these types of services, including ‘‘catalogs’’ and ‘‘registries.’’ Inventory data cover various types of information that is not considered privacy sensitive and thus shareable in an open-access mode. The business model of a bioresource may impose access restrictions, however. From the users’ perspective, it is important to achieve consistent or at least algorithmically harmonizable semantics of the information, so that it is possible to implement efficient search or filtering services. There have been a number of attempts to improve the situation with availability and consistency of the inventory data in the past decade both internationally and nationally. Prominent international examples include PG Observatory, BBMRI Preparatory Phase Catalogue, ISBER International Resource Locator, Maelstrom Repository, BBMRI-LPC catalogs, or RD-CONNECT Catalogue and the NIH/NCATS GRDR 24 on rare diseases. Although being very valuable for helping to organize biobanking and bioresources in projects with limited life spans, these tools also demonstrate the key deficiency of such centrally built and managed systems: because of the lack of automated data updates, the information becomes sooner or later obsolete and thus of limited use for the users. In contrast, distributed information systems are well known in computer infrastructures, such as cloud and grid computing systems, where various architectures have been explored, ranging from client-server communication schemes to peer-to-peer systems. The biobanking community needs to learn from these endeavors and take a similar approach with (a) distributed architecture that allows for information flow from the original sources to the inventory services, (b) welldefined stable application programming interfaces (APIs) that allow for their implementation in the biobank information management systems, (c) clear component-based architecture that allows for simple implementation of relevant data extraction and harmonization components as close to the original information sources as possible to include in-depth knowledge of the data.

[1]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[2]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[3]  Peter A. Dinda,et al.  Key Concepts and Services of a Grid Information Service , 2002 .

[4]  Artur Andrzejak,et al.  Scalable, efficient range queries for grid information services , 2002, Proceedings. Second International Conference on Peer-to-Peer Computing,.

[5]  Pedro A. Szekely,et al.  MAAN: A Multi-Attribute Addressable Network for Grid Information Services , 2003, Proceedings. First Latin American Web Congress.

[6]  Fabrizio Silvestri,et al.  A Grid Information Service Based on Peer-to-Peer , 2005, Euro-Par.

[7]  Joakim Dillner,et al.  Should donors be allowed to give broad consent to future biobank research? , 2006, The Lancet. Oncology.

[8]  Jan-Eric Litton,et al.  Biobanking for Europe , 2007, Briefings Bioinform..

[9]  B. Knoppers,et al.  Population Genomics: The Public Population Project in Genomics (P3G): a proof of concept? , 2008, European Journal of Human Genetics.

[10]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[11]  R. Chadwick,et al.  Harmonisation and standardisation in ethics and governance: conceptual and practical challenges , 2009 .

[12]  Data Protection and Sample Management in Biobanking - A legal dichotomy , 2010, Genomics, society, and policy.

[13]  Morris A. Swertz,et al.  The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button , 2010, BMC Bioinformatics.

[14]  F. Prinz,et al.  Believe it or not: how much can we rely on published data on potential drug targets? , 2011, Nature Reviews Drug Discovery.

[15]  M. Waldenberger,et al.  Comprehensive catalog of European biobanks , 2011, Nature Biotechnology.

[16]  Jan-Eric Litton,et al.  Biobank informatics: connecting genotypes and phenotypes. , 2011, Methods in molecular biology.

[17]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[18]  David Cox,et al.  Toward a roadmap in global biobanking for health , 2012, European Journal of Human Genetics.

[19]  M. Fransson,et al.  A Minimum Data Set for Sharing Biobank Samples, Information, and Data: MIABIS. , 2012, Biopreservation and biobanking.

[20]  Alysson Neves Bessani,et al.  E-biobanking: What Have You Done to My Cell Samples? , 2013, IEEE Security & Privacy.

[21]  Mina Bissell,et al.  Reproducibility: The risks of the replication drive , 2013, Nature.

[22]  Erwin Laure,et al.  Privacy Threat Modeling for Emerging BiobankClouds , 2014, EUSPN/ICTH.

[23]  P. Robinson,et al.  RD-Connect: An Integrated Platform Connecting Databases, Registries, Biobanks and Clinical Bioinformatics for Rare Disease Research , 2014, Journal of General Internal Medicine.

[24]  Jim Dowling,et al.  A security framework for population-scale genomics analysis , 2015, 2015 International Conference on High Performance Computing & Simulation (HPCS).

[25]  David Lund,et al.  Dynamic Consent: A Possible Solution to Improve Patient Confidence and Trust in How Electronic Patient Records Are Used in Medical Research , 2015, JMIR medical informatics.

[26]  European medical research escapes stifling privacy laws , 2015, Nature.

[27]  Petr Holub,et al.  Toward Global Biobank Integration by Implementation of the Minimum Information About BIobank Data Sharing (MIABIS 2.0 Core). , 2016, Biopreservation and biobanking.