FlowGate: towards extensible and scalable web-based flow cytometry data analysis

Recent advances in cytometry instrumentation are enabling the generation of "big data" at the single cell level for the identification of cell-based biomarkers, which will fundamentally change the current paradigm of diagnosis and personalized treatment of immune system disorders, cancers, and blood diseases. However, traditional flow cytometry (FCM) data analysis based on manual gating cannot effectively scale to address this new level of data generation. Computational data analysis methods have recently been developed to cope with the increasing data volume and dimensionality generated from FCM experiments. Making these computational methods easily accessible to clinicians and experimentalists is one of the biggest challenges that algorithm developers and bioinformaticians need to address. This paper describes FlowGate, a novel prototype cyberinfrastructure for web-based FCM data analysis, which integrates graphical user interfaces (GUI), workflow engines, and parallel computing resources for extensible and scalable FCM data analysis. The goal of FlowGate is to allow users to easily access state-of-the-art FCM computational methods developed using different programming languages and software on the same platform, when the implementations of these methods follow standardized I/O. By adopting existing data and information standards, FlowGate can also be integrated as the back-end data analytical platform with existing immunology and FCM databases. Experimental runs of two representative FCM data analytical methods in FlowGate on different cluster computers demonstrated that the task runtime can be reduced linearly with the number of compute cores used in the analysis.

[1]  J. Gratama,et al.  Flow cytometric characterization of cerebrospinal fluid cells , 2011, Cytometry. Part B, Clinical cytometry.

[2]  W. McGuire,et al.  Flow cytometry, cellular DNA content, and prognosis in human malignancy. , 1987, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[3]  Jianwu Wang,et al.  Challenges and approaches for distributed workflow-driven analysis of large-scale biological data: vision paper , 2012, EDBT-ICDT '12.

[4]  Nature Genetics , 1991, Nature.

[5]  Josef Spidlen,et al.  Flow cytometry data standards , 2011, BMC Research Notes.

[6]  Raphael Gottardo,et al.  Computational resources for high-dimensional immune analysis from the Human Immunology Project Consortium , 2014, Nature Biotechnology.

[7]  Cliburn Chan,et al.  Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[8]  Jill P. Mesirov,et al.  GenePattern flow cytometry suite , 2013, Source Code for Biology and Medicine.

[9]  A. Órfão,et al.  Flow cytometry in the diagnosis of cancer. , 1995, Scandinavian journal of clinical and laboratory investigation. Supplementum.

[10]  R. Scheuermann,et al.  Elucidation of seventeen human peripheral blood B‐cell subsets and quantification of the tetanus response using a density‐based method for the automated identification of cell populations in multidimensional flow cytometry data , 2010, Cytometry. Part B, Clinical cytometry.

[11]  Nikesh Kotecha,et al.  Web‐Based Analysis and Publication of Flow Cytometry Experiments , 2010, Current protocols in cytometry.

[12]  Pablo Tamayo,et al.  Cytometric profiling in multiple sclerosis uncovers patient population structure and a reduction of CD8low cells. , 2008, Brain : a journal of neurology.

[13]  Vicki Seyfert-Margolis,et al.  Omalizumab pretreatment decreases acute reactions after rush immunotherapy for ragweed-induced seasonal allergic rhinitis. , 2006, The Journal of allergy and clinical immunology.

[14]  J. Mesirov,et al.  Automated high-dimensional flow cytometric data analysis , 2009, Proceedings of the National Academy of Sciences.

[15]  Yu Qian,et al.  FCSTrans: An open source software system for FCS file conversion and data transformation , 2012, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[16]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[17]  M. Borowitz,et al.  Flow cytometry in the diagnosis of acute leukemia. , 2001, Seminars in hematology.

[18]  George Ostrouchov,et al.  Programming with BIG Data in R: Scaling Analytics from One to Thousands of Nodes , 2017, Big Data Res..

[19]  Greg Finak,et al.  Critical assessment of automated flow cytometry data analysis techniques , 2013, Nature Methods.

[20]  Jianwu Wang,et al.  Big Data Applications Using Workflows for Data Parallel Computing , 2014, Computing in Science & Engineering.

[21]  Timothy H. Keitt,et al.  Natural Variation in Abiotic Stress Responsive Gene Expression and Local Adaptation to Climate in Arabidopsis thaliana , 2014, Molecular biology and evolution.

[22]  Greg Finak,et al.  OpenCyto: An Open Source Infrastructure for Scalable, Robust, Reproducible, and Automated, End-to-End Flow Cytometry Data Analysis , 2014, PLoS Comput. Biol..

[23]  Shannon McWeeney,et al.  MIFlowCyt: The minimum information about a flow cytometry experiment , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.