A Pipeline for automated analysis of flow cytometry data: Preliminary results on lymphoma sub-type diagnosis

Flow cytometry (FCM) is widely used in health research and is a technique to measure cell properties such as phenotype, cytokine expression, etc., for up to millions of cells from a sample. FCM data analysis is a highly tedious, subjective and manually time-consuming (to the level of impracticality for some data) process that is based on intuition rather than standardized statistical inference. This study proposes a pipeline for automatic analysis of FCM data. The proposed pipeline identifies biomarkers that correlate with physiological/pathological conditions and classifies the samples to specific pathological/physiological entities. The pipeline utilizes a model-based clustering approach to identify cell populations that share similar biological functions. Support vector machine (SVM) and random forest (RF) classifiers were then used to classify the samples and identify biomarkers associated with disease status. The performance of the proposed data analysis pipeline has been evaluated on lymphoma patients. Preliminary results show more than 90% accuracy in differentiating between some sub-types of lymphoma. The proposed pipeline also finds biologically meaningful biomarkers that differ between lymphoma subtypes.

[1]  Raphael Gottardo,et al.  Automated gating of flow cytometry data via robust model‐based clustering , 2008, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[2]  J Myles,et al.  Flow cytometric immunophenotyping of non-Hodgkin's lymphomas and related disorders. , 1996, Cytometry.

[3]  Hengel Rl,et al.  An update on the use of flow cytometry in HIV infection and AIDS. , 2001 .

[4]  R. Quatrano Genomics , 1998, Plant Cell.

[5]  J B Cousar,et al.  Surgical pathology examination of lymph nodes. Practice survey by American Society of Clinical Pathologists. , 1995, American journal of clinical pathology.

[6]  D S Frankel,et al.  Application of neural networks to flow cytometry data analysis and real-time cell classification. , 1996, Cytometry.

[7]  D R Parks,et al.  Pattern sorting: a computer-controlled multidimensional sorting method using k-d trees. , 1994, Cytometry.

[8]  T C Bakker Schut,et al.  Cluster analysis of flow cytometric list mode data on a personal computer. , 1993, Cytometry.

[9]  R. Braylan,et al.  Impact of flow cytometry on the diagnosis and characterization of lymphomas, chronic lymphoproliferative disorders and plasma cell neoplasias , 2004, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[10]  H. Shapiro,et al.  The evolution of cytometers , 2004, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[11]  C Bruce Bagwell,et al.  DNA histogram analysis for node‐negative breast cancer , 2004, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[12]  Frederick L Kiechle,et al.  Genomics, transcriptomics, proteomics, and numbers. , 2009, Archives of pathology & laboratory medicine.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  J K Nicholson,et al.  An update on the use of flow cytometry in HIV infection and AIDS. , 2001, Clinics in laboratory medicine.

[16]  Ahmet Dogan,et al.  Modern histological classification of low grade B-cell lymphomas. , 2005, Best practice & research. Clinical haematology.