From Cellular Characteristics to Disease Diagnosis: Uncovering Phenotypes with Supercells

Cell heterogeneity and the inherent complexity due to the interplay of multiple molecular processes within the cell pose difficult challenges for current single-cell biology. We introduce an approach that identifies a disease phenotype from multiparameter single-cell measurements, which is based on the concept of “supercell statistics”, a single-cell-based averaging procedure followed by a machine learning classification scheme. We are able to assess the optimal tradeoff between the number of single cells averaged and the number of measurements needed to capture phenotypic differences between healthy and diseased patients, as well as between different diseases that are difficult to diagnose otherwise. We apply our approach to two kinds of single-cell datasets, addressing the diagnosis of a premature aging disorder using images of cell nuclei, as well as the phenotypes of two non-infectious uveitides (the ocular manifestations of Behçet's disease and sarcoidosis) based on multicolor flow cytometry. In the former case, one nuclear shape measurement taken over a group of 30 cells is sufficient to classify samples as healthy or diseased, in agreement with usual laboratory practice. In the latter, our method is able to identify a minimal set of 5 markers that accurately predict Behçet's disease and sarcoidosis. This is the first time that a quantitative phenotypic distinction between these two diseases has been achieved. To obtain this clear phenotypic signature, about one hundred CD8+ T cells need to be measured. Although the molecular markers identified have been reported to be important players in autoimmune disorders, this is the first report pointing out that CD8+ T cells can be used to distinguish two systemic inflammatory diseases. Beyond these specific cases, the approach proposed here is applicable to datasets generated by other kinds of state-of-the-art and forthcoming single-cell technologies, such as multidimensional mass cytometry, single-cell gene expression, and single-cell full genome sequencing techniques.

[1]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[2]  Gourab Mukherjee,et al.  Innate immune response to homologous rotavirus infection in the small intestinal villous epithelium at single-cell resolution , 2012, Proceedings of the National Academy of Sciences.

[3]  D. Grasso,et al.  Flow cytometry. , 1998, Methods in molecular medicine.

[4]  Neal S. Holter,et al.  Dynamic modeling of gene expression data. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Iftekhar Naim,et al.  Swift: Scalable weighted iterative sampling for flow cytometry clustering , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Xu-yan Yang,et al.  Th22, but not Th17 Might be a Good Index to Predict the Tissue Involvement of Systemic Lupus Erythematosus , 2013, Journal of Clinical Immunology.

[7]  Robert P. Lucht,et al.  Publisher’s Note: Label-Free Bond-Selective Imaging by Listening to Vibrationally Excited Molecules [Phys. Rev. Lett. 106 , 238106 (2011)] , 2011 .

[8]  Patrick S. Stumpf,et al.  Nanog-dependent feedback loops regulate murine embryonic stem cell heterogeneity , 2012, Nature Cell Biology.

[9]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[10]  M. Chavance [Jackknife and bootstrap]. , 1992, Revue d'epidemiologie et de sante publique.

[11]  Francis S. Collins,et al.  A lamin A protein isoform overexpressed in Hutchinson–Gilford progeria syndrome interferes with mitosis in progeria and normal cells , 2007, Proceedings of the National Academy of Sciences.

[12]  Hannah H. Chang,et al.  Transcriptome-wide noise controls lineage choice in mammalian progenitor cells , 2008, Nature.

[13]  David F. Keren,et al.  Flow Cytometry in Clinical Diagnosis , 1994 .

[14]  Lisa A Maier,et al.  CD27 Expression on CD4+ T Cells Differentiates Effector from Regulatory T Cell Subsets in the Lung1 , 2009, The Journal of Immunology.

[15]  Francis S. Collins,et al.  Human laminopathies: nuclei gone genetically awry , 2006, Nature Reviews Genetics.

[16]  Joachim Kohn,et al.  Cytoskeleton-based forecasting of stem cell lineage fates , 2009, Proceedings of the National Academy of Sciences.

[17]  R. Scheuermann,et al.  Elucidation of seventeen human peripheral blood B‐cell subsets and quantification of the tetanus response using a density‐based method for the automated identification of cell populations in multidimensional flow cytometry data , 2010, Cytometry. Part B, Clinical cytometry.

[18]  Ian H. Witten,et al.  Chapter 1 – What's It All About? , 2011 .

[19]  Wolfgang Losert,et al.  Automated image analysis of nuclear shape: What can we learn from a prematurely aged cell? , 2012, Aging.

[20]  Colin McCann,et al.  LTB4 is a signal-relay molecule during neutrophil chemotaxis. , 2012, Developmental cell.

[21]  N. Aghaeepour,et al.  Automated analysis of multidimensional flow cytometry data improves diagnostic accuracy between mantle cell lymphoma and small lymphocytic lymphoma. , 2012, American journal of clinical pathology.

[22]  Robert Gentleman,et al.  flowCore: a Bioconductor package for high throughput flow cytometry , 2009, BMC Bioinformatics.

[23]  Ian Witten,et al.  Data Mining , 2000 .

[24]  T. Nardò,et al.  Circulating CD4+ CD25brightFOXP3+ regulatory T-cells are significantly reduced in bullous pemphigoid patients , 2012, Archives of Dermatological Research.

[25]  Greg Finak,et al.  Critical assessment of automated flow cytometry data analysis techniques , 2013, Nature Methods.

[26]  H. Hoos,et al.  RchyOptimyx: Cellular hierarchy optimization for flow cytometry , 2012, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[27]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[28]  Ian H. Witten,et al.  Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[29]  Christian M Reidys,et al.  Central and local limit theorems for RNA structures. , 2007, Journal of theoretical biology.

[30]  P. Qiu Inferring Phenotypic Properties from Single-Cell Characteristics , 2012, PloS one.

[31]  Lai Wei,et al.  MCAM-expressing CD4(+) T cells in peripheral blood secrete IL-17A and are significantly elevated in inflammatory autoimmune diseases. , 2011, Journal of autoimmunity.

[32]  Pratip K. Chattopadhyay,et al.  Early immunologic correlates of HIV protection can be identified from computational analysis of complex multivariate T-cell flow cytometry assays , 2012, Bioinform..

[33]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Francis S Collins,et al.  Rapamycin Reverses Cellular Phenotypes and Enhances Mutant Protein Clearance in Hutchinson-Gilford Progeria Syndrome Cells , 2011, Science Translational Medicine.

[35]  Lihong V. Wang,et al.  Label-free bond-selective imaging by listening to vibrationally excited molecules. , 2011, Physical review letters.

[36]  S. Sealfon,et al.  flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding , 2012, Bioinform..

[37]  I. Check,et al.  Flow Cytometry in Clinical Diagnosis , 1990 .

[38]  Gerry Leversha,et al.  Foundations of modern probability (2nd edn), by Olav Kallenberg. Pp. 638. £49 (hbk). 2002. ISBN 0 387 95313 2 (Springer-Verlag). , 2004, The Mathematical Gazette.

[39]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[40]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[41]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[42]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[43]  Karen N Conneely,et al.  Inhibiting farnesylation of progerin prevents the characteristic nuclear blebbing of Hutchinson-Gilford progeria syndrome. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Nir Hacohen,et al.  Flow Cytometry, Amped Up , 2011, Science.

[45]  Lani F. Wu,et al.  Cellular Heterogeneity: Do Differences Make a Difference? , 2010, Cell.

[46]  Nathalie Arbour,et al.  Central nervous system recruitment of effector memory CD8+ T lymphocytes during neuroinflammation is dependent on α4 integrin , 2011, Brain : a journal of neurology.