FloReMi: Flow density survival regression using minimal feature redundancy

Advances in flow cytometry bioinformatics have resulted in a wide variety of clustering, classification and visualization techniques. To objectively evaluate the performance of such methods, common benchmarks such as the FlowCAP initiative have proven to be of great value. In this work, we report on a novel method, FloReMi, which was developed to tackle the most recent FlowCAP IV challenge. This challenge was formulated as a survival modeling problem, where participants were expected to design a model to predict the time until progression to AIDS for HIV patients. It is known that variability in progression rate cannot be fully predicted by simple CD4+ T cell counts. However, it is hypothesized that the immunopathogenesis established early in HIV already indicates the course of future disease. Adequately estimating the progression rate of HIV patients is crucial in their treatment. Using an automated pipeline to preprocess the data, and subsequently identify and select informative cell subsets, a survival regression method based on random survival forests was built, which obtained the best results of all submitted approaches to the FlowCAP IV challenge. © 2015 International Society for Advancement of Cytometry

[1]  Stuart C. Sealfon,et al.  Misty Mountain clustering: application to fast unsupervised flow cytometry gating , 2010, BMC Bioinformatics.

[2]  Thomas H. Scheike,et al.  Coordinate Descent Methods for the Penalized Semiparametric Additive Hazards Model , 2012 .

[3]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[4]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[5]  Greg Finak,et al.  flowDensity: reproducing manual gating of flow cytometry data by automated density-based cell population identification , 2015, Bioinform..

[6]  Dong Ling Tong,et al.  gEM/GANN: A multivariate computational strategy for auto‐characterizing relationships between cellular and clinical phenotypes and predicting disease progression time using high‐dimensional flow cytometry data , 2015, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[7]  Mario Roederer,et al.  Immunologic and virologic events in early HIV infection predict subsequent rate of progression. , 2010, The Journal of infectious diseases.

[8]  Ryan R Brinkman,et al.  Rapid cell population identification in flow cytometry data , 2011, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[9]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[10]  Raphael Gottardo,et al.  Automated gating of flow cytometry data via robust model‐based clustering , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[11]  Cliburn Chan,et al.  Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples , 2013, PLoS Comput. Biol..

[12]  Zhiliang Ying,et al.  Additive Hazards Regression Models for Survival Data , 1997 .

[13]  S. Sealfon,et al.  flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding , 2012, Bioinform..

[14]  Holger H. Hoos,et al.  Enhanced flowType/RchyOptimyx: a Bioconductor pipeline for discovery in high-dimensional cytometry data , 2014, Bioinform..

[15]  Wade T. Rogers,et al.  FlowFP: A Bioconductor Package for Fingerprinting Flow Cytometric Data , 2009, Adv. Bioinformatics.

[16]  Mario Roederer,et al.  OMIP‐001: Quality and phenotype of Ag‐responsive human T‐cells , 2010, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[17]  Arvind Gupta,et al.  Data reduction for spectral clustering to analyze high throughput flow cytometry data , 2010, BMC Bioinformatics.

[18]  Iftekhar Naim,et al.  SWIFT—Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design , 2014, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[19]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[20]  Greg Finak,et al.  Critical assessment of automated flow cytometry data analysis techniques , 2013, Nature Methods.

[21]  J V Watson Time, a quality-control parameter in flow cytometry. , 1987, Cytometry.

[22]  Sean C. Bendall,et al.  Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE , 2011, Nature Biotechnology.

[23]  P. Chattopadhyay,et al.  Good cell, bad cell: Flow cytometry reveals T‐cell subsets important in HIV disease , 2010, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[24]  R. Tibshirani,et al.  Automated identification of stratifying signatures in cellular subpopulations , 2014, Proceedings of the National Academy of Sciences.

[25]  Pratip K. Chattopadhyay,et al.  Early immunologic correlates of HIV protection can be identified from computational analysis of complex multivariate T-cell flow cytometry assays , 2012, Bioinform..

[26]  Ronald M. Levy,et al.  Joint Modeling and Registration of Cell Populations in Cohorts of High-Dimensional Flow Cytometric Data , 2013, PloS one.