Integrated Computing and Tracking System for Centralized High-Throughput Genetic Analysis: a Case Study

The Genetic Analysis Center (GAC) of the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) developed an Integrated Computing and Tracking system (ICT) in order to perform genome-wide and other genetic association studies automatically and efficiently, while documenting all analysis specifications. This system provides easy-to-use analysis set-up and computing procedures, automatic reports, and analysis search functionality due to integration with an on-site database. In this paper we describe the ICT and demonstrate how it satisfies key principles of reproducible research, while respecting constraints and challenges arising from using very large, restricted access, human-subjects data. This case study may benefit other groups that have similar requirements for high-throughput analysis execution and management.

[1]  D. Levy,et al.  Admixture Mapping Identifies an Amerindian Ancestry Locus Associated with Albuminuria in Hispanics in the United States. , 2017, Journal of the American Society of Nephrology : JASN.

[2]  Michael Boehnke,et al.  LocusZoom: regional visualization of genome-wide association scan results , 2010, Bioinform..

[3]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[4]  A. Stilp,et al.  Meta‐Analysis of Genome‐Wide Association Studies with Correlated Individuals: Application to the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) , 2016, Genetic epidemiology.

[5]  Identifi cation of additional risk loci for stroke and small vessel disease : a meta-analysis of genome-wide association , 2016 .

[6]  S. Redline,et al.  Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models. , 2016, American journal of human genetics.

[7]  Kathleen F. Kerr,et al.  Genome-wide Association Study of Platelet Count Identifies Ancestry-Specific Loci in Hispanic/Latino Americans. , 2016, American journal of human genetics.

[8]  Bernard J. Pope,et al.  Bpipe: a tool for running and managing bioinformatics pipelines , 2012, Bioinform..

[9]  Marta M. Jankowska,et al.  The Hispanic Community Health Study/Study of Latinos Community and Surrounding Areas Study: sample, design, and procedures. , 2019, Annals of epidemiology.

[10]  Kathleen F. Kerr,et al.  Genome‐wide association study of generalized anxiety symptoms in the Hispanic Community Health Study/Study of Latinos , 2017, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[11]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[12]  Prakash M. Nadkarni,et al.  Guidelines for the effective use of entity-attribute-value modeling for biomedical databases , 2007, Int. J. Medical Informatics.

[13]  Enis Afgan,et al.  BioBlend: automating pipeline analyses within Galaxy and CloudMan , 2013, Bioinform..

[14]  Kathleen F. Kerr,et al.  Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos. , 2016, American journal of human genetics.

[15]  Kathleen F. Kerr,et al.  GWAS Identifies New Loci for Painful Temporomandibular Disorder: Hispanic Community Health Study/Study of Latinos , 2017, Journal of dental research.