A multi-omics digital research object for the genetics of sleep regulation

More and more researchers make use of multi-omics approaches to tackle complex cellular and organismal systems. It has become apparent that the potential for re-use and integrate data generated by different labs can enhance knowledge. However, a meaningful and efficient re-use of data generated by others is difficult to achieve without in depth understanding of how these datasets were assembled. We therefore designed and describe in detail a digital research object embedding data, documentation and analytics on mouse sleep regulation. The aim of this study was to bring together electrophysiological recordings, sleep-wake behavior, metabolomics, genetics, and gene regulatory data in a systems genetics model to investigate sleep regulation in the BXD panel of recombinant inbred lines. We here showcase both the advantages and limitations of providing such multi-modal data and analytics. The reproducibility of the results was tested by a bioinformatician not implicated in the original project and the robustness of results was assessed by re-annotating genetic and transcriptome data from the mm9 to the mm10 mouse genome assembly.

[1]  A. Malafosse,et al.  Genetic variation in EEG activity during sleep in inbred mice. , 1998, American journal of physiology. Regulatory, integrative and comparative physiology.

[2]  P. Franken,et al.  Hypocretin (orexin) is critical in sustaining theta/gamma-rich waking behaviors that drive sleep need , 2017, Proceedings of the National Academy of Sciences.

[3]  N. Baliga,et al.  The State of Systems Genetics in 2017. , 2017, Cell systems.

[4]  S. Pradervand,et al.  Homer1a is a core brain molecular correlate of sleep loss , 2007, Proceedings of the National Academy of Sciences.

[5]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[6]  Sarah L Burgess-Herbert,et al.  Practical Applications of the Bioinformatics Toolbox for Narrowing Quantitative Trait Loci , 2008, Genetics.

[7]  Ioannis Xenarios,et al.  FastEpistasis: a high performance computing solution for quantitative trait epistasis , 2010, Bioinform..

[8]  Emmanouil T. Dermitzakis,et al.  Fast and efficient QTL mapper for thousands of molecular phenotypes , 2015, bioRxiv.

[9]  Jesse D. Ziebarth,et al.  Segregation of a Spontaneous Klrd1 (CD94) Mutation in DBA/2 Mouse Substrains , 2014, G3: Genes, Genomes, Genetics.

[10]  Ruben Verborgh,et al.  Developing a framework for digital objects in the Big Data to Knowledge (BD2K) commons: Report from the Commons Framework Pilots workshop , 2017, J. Biomed. Informatics.

[11]  Nataša Pržulj,et al.  Methods for biological data integration: perspectives and challenges , 2015, Journal of The Royal Society Interface.

[12]  E. Lander,et al.  Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results , 1995, Nature Genetics.

[13]  Kathleen M Jagodnik,et al.  Massive mining of publicly available RNA-seq data from human and mouse , 2017, Nature Communications.

[14]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[15]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[16]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[17]  K. Broman,et al.  A Guide to QTL Mapping with R/qtl , 2009 .

[18]  J. Ioannidis,et al.  Reproducible research practices, transparency, and open access data in the biomedical literature, 2015–2017 , 2018, PLoS biology.

[19]  Emmanouil T. Dermitzakis,et al.  Fast and efficient QTL mapper for thousands of molecular phenotypes , 2015 .

[20]  Alban Gaignard,et al.  Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities , 2017, Future Gener. Comput. Syst..

[21]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[22]  G. Buzsáki Theta Oscillations in the Hippocampus , 2002, Neuron.

[23]  Denis Torre,et al.  BioJupies: Automated Generation of Interactive Notebooks for RNA-Seq Data Analysis in the Cloud. , 2018, Cell systems.

[24]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[25]  Morris A. Swertz,et al.  Bioinformatics tools and database resources for systems genetics analysis in mice—a short review and an evaluation of future needs , 2011, Briefings Bioinform..

[26]  Jing Wang,et al.  CrossMap: a versatile tool for coordinate conversion between genome assemblies , 2014, Bioinform..

[27]  D. Skene,et al.  Twenty-four-hour rhythmicity of circulating metabolites: effect of body mass and type 2 diabetes , 2017, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[28]  Robert W. Williams,et al.  Systems genetics identifies Hp1bp3 as a novel modulator of cognitive aging , 2016, Neurobiology of Aging.

[29]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[30]  Ben Baumer,et al.  R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics , 2014, 1402.1894.

[31]  Ruben Verborgh,et al.  Interoperability and FAIRness through a novel combination of Web technologies , 2017, PeerJ Prepr..

[32]  Ning Jiang,et al.  Our path to better science in less time using open data science tools , 2017, Nature Ecology &Evolution.

[33]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[34]  Inanç Birol,et al.  Hive plots - rational approach to visualizing networks , 2012, Briefings Bioinform..

[35]  Florence I. Raynaud,et al.  Effect of sleep deprivation on the human metabolome , 2014, Proceedings of the National Academy of Sciences.

[36]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[37]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[38]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[39]  Yihui Xie,et al.  knitr: A Comprehensive Tool for Reproducible Research in R , 2018, Implementing Reproducible Research.

[40]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[41]  L. Ryan Characterization of cortical spindles in DBA/2 and C57BL/6 inbred mice , 1984, Brain Research Bulletin.

[42]  Paul Franken,et al.  Sleep and EEG Phenotyping in Mice. , 2012, Current protocols in mouse biology.

[43]  B. Thorens,et al.  A Genetic Screen Identifies Hypothalamic Fgf15 as a Regulator of Glucagon Secretion , 2016, Cell reports.

[44]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[45]  Carole A. Goble,et al.  Why Linked Data is Not Enough for Scientists , 2010, 2010 IEEE Sixth International Conference on e-Science.

[46]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[47]  Robert W. Williams,et al.  Systems Genetics of Metabolism: The Use of the BXD Murine Reference Panel for Multiscalar Integration of Traits , 2012, Cell.

[48]  Robert W. Williams,et al.  A new set of BXD recombinant inbred lines from advanced intercross populations in mice , 2004, BMC Genetics.

[49]  A. Lusis,et al.  Systems genetics approaches to understand complex traits , 2013, Nature Reviews Genetics.

[50]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[51]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[52]  Nicole A. Vasilevsky,et al.  Reproducible and reusable research: are journal data sharing policies meeting the mark? , 2017, PeerJ.

[53]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[54]  N. Guex,et al.  A systems genetics resource and analysis of sleep regulation in the mouse , 2018, PLoS biology.

[55]  D. Welsh,et al.  A circadian rhythm of hippocampal theta activity in the mouse , 1985, Physiology & Behavior.

[56]  M. Hallschmid,et al.  The metabolic burden of sleep loss. , 2015, The lancet. Diabetes & endocrinology.

[57]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[58]  Ana Sofia Figueiredo,et al.  Data Sharing: Convert Challenges into Opportunities , 2017, Front. Public Health.