Model for comparative analysis of antigen receptor repertoires.

In modern molecular biology one of the standard ways of analyzing a vertebrate immune system is to sequence and compare the counts of specific antigen receptor clones (either immunoglobulins or T-cell receptors) derived from various tissues under different experimental or clinical conditions. The resulting statistical challenges are difficult and do not fit readily into the standard statistical framework of contingency tables primarily due to the serious under-sampling of the receptor populations. This under-sampling is caused, on one hand, by the extreme diversity of antigen receptor repertoires maintained by the immune system and, on the other, by the high cost and labor intensity of the receptor data collection process. In most of the recent immunological literature the differences across antigen receptor populations are examined via non-parametric statistical measures of the species overlap and diversity borrowed from ecological studies. While this approach is robust in a wide range of situations, it seems to provide little insight into the underlying clonal size distribution and the overall mechanism differentiating the receptor populations. As a possible alternative, the current paper presents a parametric method that adjusts for the data under-sampling as well as provides a unifying approach to a simultaneous comparison of multiple receptor groups by means of the modern statistical tools of unsupervised learning. The parametric model is based on a flexible multivariate Poisson-lognormal distribution and is seen to be a natural generalization of the univariate Poisson-lognormal models used in the ecological studies of biodiversity patterns. The procedure for evaluating a model's fit is described along with the public domain software developed to perform the necessary diagnostics. The model-driven analysis is seen to compare favorably vis a vis traditional methods when applied to the data from T-cell receptors in transgenic mice populations.

[1]  A. Solow,et al.  AN ESTIMATOR OF SPECIES OVERLAP USING A MODIFIED BETA-BINOMIAL MODEL , 1996 .

[2]  C. Hsieh,et al.  Antigen-specific peripheral shaping of the natural regulatory T cell population , 2008, The Journal of experimental medicine.

[3]  D. Hartl,et al.  T-cell receptor beta-chain expression: dependence on relatively few variable region genes. , 1985, Science.

[4]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[5]  A. Vallejo,et al.  The Influence of Age on T Cell Generation and TCR Diversity1 , 2005, The Journal of Immunology.

[6]  D. Joanes,et al.  Bayesian estimation of the number of species , 1984 .

[7]  R. Tibshirani,et al.  An Introduction to the Bootstrap , 1995 .

[8]  Bin Yu,et al.  Coverage-adjusted entropy estimation. , 2007, Statistics in medicine.

[9]  René L. Warren,et al.  Profiling model T-cell metagenomes with short reads , 2009, Bioinform..

[10]  Min He,et al.  SpA: web-accessible spectratype analysis: data management, statistical analysis and visualization , 2005, Bioinform..

[11]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[12]  K. Wucherpfennig,et al.  Structural biology of the T-cell receptor: insights into receptor assembly, ligand recognition, and initiation of signaling. , 2010, Cold Spring Harbor perspectives in biology.

[13]  A. Rudensky,et al.  Recognition of the peripheral self by naturally arising CD25+ CD4+ T cell receptors. , 2004, Immunity.

[14]  T. Koski Hidden Markov Models for Bioinformatics , 2001 .

[15]  R. Gill,et al.  Confidence Estimation via the Parametric Bootstrap in Logistic Joinpoint Regression. , 2009, Journal of statistical planning and inference.

[16]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[17]  J. Keith Ord,et al.  The poisson-inverse gaussian disiribuiion as a model for species abundance , 1986 .

[18]  R. Fisher,et al.  The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population , 1943 .

[19]  Hierarchical Bayesian Estimation for the Number of Species , 2001 .

[20]  R. Jacobson,et al.  Personalized vaccines: the emerging field of vaccinomics , 2008, Expert opinion on biological therapy.

[21]  Mark M. Tanaka,et al.  Method for assessing the similarity between subsets of the T cell receptor repertoire. , 2008, Journal of immunological methods.

[22]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[23]  Nagendra Singh,et al.  Nonself-antigens are the cognate specificities of Foxp3+ regulatory T cells. , 2007, Immunity.

[24]  R. Podolsky,et al.  Foxp3-Deficient Regulatory T Cells Do Not Revert into Conventional Effector CD4+ T Cells but Constitute a Unique Cell Subset12 , 2009, The Journal of Immunology.

[25]  A. Sheldon,et al.  Equitability Indices: Dependence on the Species Count , 1969 .

[26]  A. Rudensky,et al.  An intersection between the self-reactive regulatory and nonregulatory T cell receptor repertoires , 2006, Nature Immunology.

[27]  Steinar Engen,et al.  Analyzing Spatial Structure of Communities Using the Two‐Dimensional Poisson Lognormal Species Abundance Model , 2002, The American Naturalist.

[28]  Min He,et al.  BIOINFORMATICS ORIGINAL PAPER Genetics and population analysis Statistical analysis of antigen receptor spectratype data , 2022 .

[29]  R. White,et al.  High-Throughput Sequencing of the Zebrafish Antibody Repertoire , 2009, Science.

[30]  R. Jacobson,et al.  Application of pharmacogenomics to vaccines. , 2009, Pharmacogenomics.

[31]  J. Carneiro,et al.  Estimation of T-cell repertoire diversity and clonal size distribution by Poisson abundance models. , 2010, Journal of immunological methods.

[32]  D. Karlis An EM algorithm for multivariate Poisson distribution and related models , 2003 .

[33]  G. Rempała,et al.  TCR Repertoire and Foxp3 Expression Define Functionally Distinct Subsets of CD4+ Regulatory T Cells1 , 2009, The Journal of Immunology.

[34]  M. Bevan,et al.  Massive expansion of antigen-specific CD8+ T cells during an acute virus infection. , 1998, Immunity.

[35]  Andrew R. Solow,et al.  On the Bayesian Estimation of the Number of Species in a Community , 1994 .

[36]  Emmanuel Beaudoing,et al.  Size Estimate of the αβ TCR Repertoire of Naive Mouse Splenocytes1 , 2000, The Journal of Immunology.

[37]  G. Rempała,et al.  Bootstrapping Parametric Models of Mortality , 2004 .

[38]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[39]  A. Casrouge,et al.  A Direct Estimate of the Human αβ T Cell Receptor Diversity , 1999 .

[40]  T. Nayak Estimating the number of component processes of a superimposed process , 1991 .

[41]  Mark M. Davis,et al.  T-cell antigen receptor genes and T-cell recognition , 1988, Nature.

[42]  Gene H. Golub,et al.  Matrix computations , 1983 .

[43]  E. Naumova,et al.  A Fractal Clonotype Distribution in the CD8+ Memory T Cell Repertoire Could Optimize Potential for Immune Responses1 , 2003, The Journal of Immunology.

[44]  S. Perlman,et al.  Very Diverse CD8 T Cell Clonotypic Responses after Virus Infections1 , 2004, The Journal of Immunology.

[45]  C. Benoist,et al.  TCR-based lineage tracing: no evidence for conversion of conventional into regulatory T cells in response to a natural self-antigen in pancreatic islets , 2007, The Journal of experimental medicine.

[46]  Katherine Kedzierska,et al.  Methods for comparing the diversity of samples of the T cell receptor repertoire. , 2007, Journal of immunological methods.

[48]  B. Malissen,et al.  Heterogeneity of natural Foxp3+ T cells: A committed regulatory T-cell lineage and an uncommitted minor population retaining plasticity , 2009, Proceedings of the National Academy of Sciences.

[49]  J. Bunge,et al.  Bayesian Estimation of the Number of Species using Noninformative Priors , 2008, Biometrical journal. Biometrische Zeitschrift.

[50]  M. Bulmer On Fitting the Poisson Lognormal Distribution to Species-Abundance Data , 1974 .

[51]  P. Cazenave,et al.  New methods and software tools for high throughput CDR3 spectratyping. Application to T lymphocyte repertoire modifications during experimental malaria. , 2003, Journal of immunological methods.

[52]  A. Chao,et al.  Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample , 2004, Environmental and Ecological Statistics.

[53]  L. Hood,et al.  The murine T-cell receptor uses a limited repertoire of expressed Vβ gene segments , 1985, Nature.

[54]  R. Holt,et al.  Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing. , 2009, Genome research.

[55]  F. Wong,et al.  Non-obese diabetic mice select a low-diversity repertoire of natural regulatory T cells , 2009, Proceedings of the National Academy of Sciences.

[56]  J. Aitchison,et al.  The multivariate Poisson-log normal distribution , 1989 .

[57]  Li Li,et al.  Conversion of Peripheral CD4+CD25− Naive T Cells to CD4+CD25+ Regulatory T Cells by TGF-β Induction of Transcription Factor Foxp3 , 2003, The Journal of experimental medicine.

[58]  C. Janeway Immunobiology: The Immune System in Health and Disease , 1996 .

[59]  Baback Gharizadeh,et al.  High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets , 2010, Proceedings of the National Academy of Sciences.

[60]  L. Ignatowicz,et al.  Origin and T cell receptor diversity of Foxp3+CD4+CD25+ T cells. , 2006, Immunity.