MaPSeq, A Service-Oriented Architecture for Genomics Research within an Academic Biomedical Research Institution

Genomics research presents technical, computational, and analytical challenges that are well recognized. Less recognized are the complex sociological, psychological, cultural, and political challenges that arise when genomics research takes place within a large, decentralized academic institution. In this paper, we describe a Service-Oriented Architecture (SOA)—MaPSeq—that was conceptualized and designed to meet the diverse and evolving computational workflow needs of genomics researchers at our large, hospital-affiliated, academic research institution. We present the institutional challenges that motivated the design of MaPSeq before describing the architecture and functionality of MaPSeq. We then discuss SOA solutions and conclude that approaches such as MaPSeq enable efficient and effective computational workflow execution for genomics research and for any type of academic biomedical research that requires complex, computationally-intense workflows.

[1]  Andrea Pinna,et al.  Orione, a web-based framework for NGS analysis in microbiology , 2014, Bioinform..

[2]  Saskia Sassen,et al.  Towards a Sociology of Information Technology , 2002 .

[3]  Nancy M. Lorenzi,et al.  Review: Antecedents of the People and Organizational Aspects of Medical Informatics: Review of the Literature , 1997, J. Am. Medical Informatics Assoc..

[4]  Arthur W. Toga,et al.  Applications of the pipeline environment for visual informatics and genomics computations , 2011, BMC Bioinformatics.

[5]  Kenneth D. Doig,et al.  Bioinformatics Pipelines for Targeted Resequencing and Whole-Exome Sequencing of Human and Mouse Genomes: A Virtual Appliance Approach for Instant Deployment , 2014, PloS one.

[6]  Jonathan Crabtree,et al.  Ergatis: a web interface and scalable software system for bioinformatics workflows , 2010, Bioinform..

[7]  Alfonso Valencia,et al.  RUbioSeq: a suite of parallelized pipelines to automate exome variation and bisulfite-seq analyses , 2013, Bioinform..

[8]  Fabian A. Buske,et al.  NGSANE: a lightweight production informatics framework for high-throughput data analysis , 2014, Bioinform..

[9]  Wu-chun Feng,et al.  Accelerating Data-Intensive Genome Analysis in the Cloud , 2013 .

[10]  Jesper Tegnér,et al.  STATegra EMS: an Experiment Management System for complex next-generation omics experiments , 2014, BMC Systems Biology.

[11]  Y. Bromberg,et al.  Building a genome analysis pipeline to predict disease risk and prevent disease. , 2013, Journal of molecular biology.

[12]  Robert C. Green,et al.  Clinical Genome Sequencing , 2013 .

[13]  Richard K. Wilson,et al.  Challenges of sequencing human genomes , 2010, Briefings Bioinform..

[14]  Robert W. Zmud,et al.  Social influence and individual IT use: unraveling the pathways of appropriation moves , 1999, ICIS.

[15]  Isaac S. Kohane,et al.  A translational engine at the national scale: informatics for integrating biology and the bedside , 2012, J. Am. Medical Informatics Assoc..

[16]  Wanda J. Orlikowski,et al.  Technology and Institutions: What Can Research on Information Technology and Research on Organizations Learn from Each Other? , 2001, MIS Q..

[17]  D. Edge,et al.  The social shaping of technology , 1988 .

[18]  Peter J. Tonellato,et al.  COSMOS: Python library for massively parallel workflows , 2014, Bioinform..

[19]  Scott D. Kahn On the Future of Genomic Data , 2011, Science.

[20]  Euan A Ashley,et al.  Clinical interpretation and implications of whole-genome sequencing. , 2014, JAMA.

[21]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[22]  Gérald Salin,et al.  NG6: Integrated next generation sequencing storage and processing environment , 2012, BMC Genomics.

[23]  Marco A. Marra,et al.  Massively Parallel Sequencing , 2011, Encyclopedia of Autism Spectrum Disorders.