RImmPort: enabling ready-for-analysis immunology research data

Broad open access to entire clinical research studies data is on the rise. Public access to raw clinical research data has created tremendous opportunity to evaluate new research hypotheses that were not originally formulated in the studies; by reanalyzing data from a study, by performing cross analysis of multiple studies, or by combining study data with other public research datasets. But such analysis of disparate data presupposes a) uniform representation of research data using data standards, and b) easy access to such standard representations of clinical research data in analytical environments. The Immunology Database and Analysis Portal (ImmPort: immport.niaid.nih.gov) system [1] warehouses clinical study data in all areas of immunology that is generated by scientific researchers supported by the National Institute of Allergy and Infectious Diseases (NIAID) / Division of Allergy, Immunology and Transplantation (DAIT). Currently, over 100 studies are publicly available in ImmPort. Under the sponsorship of the ImmPort project, we are developing RImmPort that prepares ImmPort data for analysis in the open-source R statistical environment. RImmPort comprises of four main components: 1) a specification of R study classes that encapsulate study data. The specification leverages of study data standards from the Clinical Data Interchange Standards Consortium (CDISC), and incorporates terms and semantics found in these standards, 2) foundational methods to load data for a specific study in ImmPort. These methods essentially create R objects based on the R study classes, access study data from ImmPort and populate the R objects with the downloaded data, 3) generic methods to slice and dice data across different dimensions of study data, and 4) custom methods to combine specific types of study data from multiple studies. Thus, RImmPort hides the complexities and idiosyncrasies of the ImmPort data repository model, and provides easy access to the study data in a structure that is conducive for analysis. Using RImmPort, an entire study can be loaded into R with a single command. For example, a researcher interested in analyzing a specific study ImmPort:SDY1, can use RImmPort to easily access different types of individual-level data -subject demographics, clinical assessments, adverse events, results of flow cytometry and ELISA experiments on 4211 biosamples collected at different time points over 12 weeks from 159 subjects. By basing RImmPort on open formalisms such as CDISC standards and by making it available in open source bioinformatics platforms such as Bioconductor, we ensure that clinical study data in ImmPort is ready for analysis, thus enabling innovative bioinformatics research in immunology.