A FAIR-Based Approach to Enhancing the Discovery and Re-Use of Transcriptomic Data Assets for Nuclear Receptor Signaling Pathways

Public transcriptomic assets in the nuclear receptor (NR) signaling field hold considerable collective potential for exposing underappreciated aspects of NR regulation of gene expression. This potential is undermined however by a series of enduring informatic pain points that retard the routine re-use of these datasets. Here we describe a coordinated biocuration and web development approach to redress this situation that is closely aligned with ideals articulated in the FAIR (findable, accessible, interoperable, re-usable) principles on data stewardship. To improve findability, biocurators engage authors of studies in collaborating journals to secure datasets for deposition in public archives. Annotated derivatives of the archived datasets are assigned digital object identifiers and regulatory molecule identifiers that support persistent linkages between datasets and their associated research articles, integration in relevant records in gene and small molecule knowledgebases, and indexing by dataset search engines. To enhance their accessibility and interoperability, datasets are visualizable in responsively designed web pages, retrievable in machine-readable spreadsheets, or through an application programming interface. Re-use of the datasets is supported by their interrogation as a universe of data points through the Transcriptomine search engine, highlighting transcriptional intersections between NR signaling pathways, physiological processes and disease states. We illustrate the value of our approach in connecting disparate research communities using a use case of persistent interoperability between the Nuclear Receptor Signaling Atlas and the Pharmacogenomics Knowledgebase. Our FAIR-aligned model demonstrates the enduring value of discovery-scale datasets that accrues from their systematic compilation, biocuration and distribution across the digital biomedical research enterprise.

[1]  M. Kampa,et al.  Conjugated and non-conjugated androgens differentially modulate specific early gene transcription in breast cancer in a cell-specific manner , 2010, Steroids.

[2]  Neil J. McKenna,et al.  Combinatorial Control of Gene Expression by Nuclear Receptors and Coregulators , 2002, Cell.

[3]  Yolanda F. Darlington,et al.  Nuclear Receptor Signaling Atlas: Opening Access to the Biology of Nuclear Receptor Signaling Pathways , 2015, PloS one.

[4]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[5]  Rongling Wu,et al.  Differential Gene Expression in Tamoxifen-Resistant Breast Cancer Cells Revealed by a New Analytical Model of RNA-Seq Data , 2012, PloS one.

[6]  Kenneth W Witwer,et al.  Data submission and quality in microarray-based microRNA profiling. , 2013, Clinical chemistry.

[7]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[8]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[9]  Christine L Borgman,et al.  Why are the attribution and citation of scientific data important? In: Uhlir, Paul and Cohen, Daniel (eds.). Report from Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop. , 2012 .

[10]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[11]  J. Ioannidis,et al.  Registering diagnostic and prognostic trials of tests: is it the right thing to do? , 2014, Clinical chemistry.

[12]  Ralph Mazitschek,et al.  Treatment of Obesity with Celastrol , 2015, Cell.

[13]  Megan F. Cole,et al.  Core Transcriptional Regulatory Circuitry in Human Embryonic Stem Cells , 2005, Cell.

[14]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[15]  M. Tsai,et al.  Nuclear receptor LRH-1/NR5A2 is required and targetable for liver endoplasmic reticulum stress resolution , 2014, eLife.

[16]  Zhiyong Lu,et al.  Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE , 2012, Database J. Biol. Databases Curation.

[17]  S. Bhasin,et al.  Testosterone inhibits adipogenic differentiation in 3T3-L1 cells: nuclear translocation of androgen receptor complex with beta-catenin and T-cell factor 4 may bypass canonical Wnt signaling to down-regulate adipogenic transcription factors. , 2006, Endocrinology.

[18]  Christian J Stoeckert,et al.  Much room for improvement in deposition rates of expression microarray datasets , 2008, Nature Methods.

[19]  Lauren B. Becnel,et al.  Improving the discoverability, accessibility, and citability of omics datasets: a case report , 2017, J. Am. Medical Informatics Assoc..

[20]  A. James,et al.  Current methods of adipogenic differentiation of mesenchymal stem cells. , 2011, Stem cells and development.

[21]  Sarah Callaghan,et al.  Joint declaration of data citation principles , 2014 .

[22]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[23]  Russ B Altman,et al.  PharmGKB: the Pharmacogenomics Knowledge Base. , 2013, Methods in molecular biology.

[24]  Graeme G. Shanks,et al.  Successfully completing case study research: combining rigour, relevance and pragmatism , 1998, Inf. Syst. J..

[25]  Amy Shah,et al.  Androgens inhibit adipogenesis during human adipose stem cell commitment to preadipocyte formation , 2013, Steroids.

[26]  K. Umesono,et al.  The nuclear receptor superfamily: The second decade , 1995, Cell.

[27]  Wei Chen,et al.  Comparative transcriptomic analysis of white and red Chinese bayberry (Myrica rubra) fruits reveals flavonoid biosynthesis regulation , 2018 .

[28]  A. Sharov,et al.  Global gene expression profiling reveals similarities and differences among mouse pluripotent stem cells of different origins and strains. , 2007, Developmental biology.

[29]  F. Plénat,et al.  An Alkylphenol Mix Promotes Seminoma Derived Cell Proliferation through an ERalpha36-Mediated Mechanism , 2013, PloS one.

[30]  Michelle Dunn,et al.  The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data , 2014, J. Am. Medical Informatics Assoc..