Efficient data management infrastructure for the integration of imaging and omics data in life science research

As technical developments in omics and biomedical imaging drive the increase in quality, modality, and throughput of data generation in life sciences, the need for information systems capable of integrative, long-term storage and management of these heterogeneous digital assets is also increasing. Here, we propose an approach based on principles of Service Oriented Architecture design, to allow the integrated management and analysis of multi-omics and biomedical imaging data. The proposed architecture introduces an interoperable image management system, the OMERO server, into the backend of qPortal, a FAIR-compliant web-based platform for omics data management. The implementation of an integrated metadata model, the development of software components to enable the communication with the OMERO server, and an extension to the data management operations of established software, allows for FAIR management of heterogeneous omics and biomedical imaging data within an integrated system, which facilitates metadata queries from web-based scientific applications. The applicability of the proposed architecture is demonstrated in two prototypical use cases, a plant biology study using confocal scanning microscopy, and a clinical study on hepatocellular carcinoma, with data from a variety of medical imaging and omics modalities. We anticipate that FAIR data management systems for multi-modal data repositories will play a pivotal role in data-driven research, as the joint analysis of omics and imaging data becomes not only desirable but necessary to derive novel insights into biological processes. In particular for powerful machine learning applications where the availability of large datasets with high quality phenotypic annotations is a requirement.

[1]  Douglas A. Creager,et al.  The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and quantitative analysis in biological imaging , 2005, Genome Biology.

[2]  Yohann Tschudi,et al.  Prostate cancer radiomics and the promise of radiogenomics. , 2016, Translational cancer research.

[3]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[4]  Klaus Harter,et al.  Arabidopsis RUP2 represses UVR8-mediated flowering in noninductive photoperiods , 2018, Genes & development.

[5]  Yifan Cheng Single-Particle Cryo-EM at Crystallographic Resolution , 2015, Cell.

[6]  Jan Ellenberg,et al.  Integrating Imaging and Omics: Computational Methods and Challenges , 2019, Annual Review of Biomedical Data Science.

[7]  Nicolai M. Josuttis,et al.  Soa In Practice The Art Of Distributed System Design , 2007 .

[8]  C. Rueden,et al.  Metadata matters: access to image data in the real world , 2010, The Journal of cell biology.

[9]  Xiaowei Zhuang,et al.  Visualizing and discovering cellular structures with super-resolution microscopy , 2018, Science.

[10]  Miklós Kozlovszky,et al.  WS-PGRADE/gUSE Generic DCI Gateway Framework for a Large Variety of User Communities , 2012, Journal of Grid Computing.

[11]  Andreas Friedrich,et al.  qPortal: A platform for data-driven biomedical research , 2018, PloS one.

[12]  Bernd Rinn,et al.  openBIS: a flexible framework for managing and analyzing complex data in biology research , 2011, BMC Bioinformatics.

[13]  Lorenz Blum,et al.  Improving the Swiss Grid Proteomics Portal: Requirements and new Features based on Experience and Usability Considerations , 2013, IWSG.

[14]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[15]  Sven Nahnsen,et al.  The nf-core framework for community-curated bioinformatics pipelines , 2020, Nature Biotechnology.

[16]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[17]  Bernd J Pichler,et al.  Linking imaging to omics utilizing image-guided tissue extraction , 2018, Proceedings of the National Academy of Sciences.

[18]  Gianmauro Cuccuru,et al.  Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis , 2010, Nature Genetics.

[19]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[20]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[21]  Nicolas Chenouard,et al.  Icy: an open bioimage informatics platform for extended reproducible research , 2012, Nature Methods.

[22]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[23]  Michi Henning,et al.  A new approach to object-oriented middleware , 2004, IEEE Internet Computing.