Using affinity propagation for identifying subspecies among clonal organisms: lessons from M. tuberculosis

BackgroundClassification and naming is a key step in the analysis, understanding and adequate management of living organisms. However, where to set limits between groups can be puzzling especially in clonal organisms. Within the Mycobacterium tuberculosis complex (MTC), the etiological agent of tuberculosis (TB), experts have first identified several groups according to their pattern at repetitive sequences, especially at the CRISPR locus (spoligotyping), and to their epidemiological relevance. Most groups such as "Beijing" found good support when tested with other loci. However, other groups such as T family and T1 subfamily (belonging to the "Euro-American" lineage) correspond to non-monophyletic groups and still need to be refined. Here, we propose to use a method called Affinity Propagation that has been successfully used in image categorization to identify relevant patterns at the CRISPR locus in MTC.ResultsTo adequately infer the relative divergence time between strains, we used a distance method inspired by the recent evolutionary model by Reyes et al. We first confirm that this method performs better than the Jaccard index commonly used to compare spoligotype patterns. Second, we document the support of each spoligotype family among the previous classification using affinity propagation on the international spoligotyping database SpolDB4. This allowed us to propose a consensus assignation for all SpolDB4 spoligotypes. Third, we propose new signatures to subclassify the T family.ConclusionAltogether, this study shows how the new clustering algorithm Affinity Propagation can help building or refining clonal organims classifications. It also describes well-supported families and subfamilies among M. tuberculosis complex, especially inside the modern "Euro-American" lineage.

[1]  N. Kurepina,et al.  Genetic analysis of mycobacterium tuberculosis strains isolated in Ural region, Russian Federation, by MIRU-VNTR genotyping. , 2005, The international journal of tuberculosis and lung disease : the official journal of the International Union against Tuberculosis and Lung Disease.

[2]  Ibnelwaleed A. Hussein,et al.  Influence of Mw of LDPE and vinyl acetate content of EVA on the rheology of polymer modified asphalt , 2005 .

[3]  J. T. Crawford,et al.  Repetitive DNA sequences as probes for Mycobacterium tuberculosis , 1988, Journal of clinical microbiology.

[4]  Gilles Vergnaud,et al.  High resolution, on-line identification of strains from the Mycobacterium tuberculosis complex based on tandem repeat typing , 2002, BMC Microbiology.

[5]  Riccardo Zecchina,et al.  Clustering with shallow trees , 2009, ArXiv.

[6]  Nalin Rastogi,et al.  Genotyping of Mycobacterium tuberculosis clinical isolates in two cities of Turkey: Description of a new family of genotypes that is phylogeographically specific for Asia Minor , 2005, BMC Microbiology.

[7]  R. Frothingham,et al.  Comparison of Methods Based on Different Molecular Epidemiological Markers for Typing of Mycobacterium tuberculosis Complex Strains: Interlaboratory Study of Discriminatory Power and Reproducibility , 1999, Journal of Clinical Microbiology.

[8]  BMC Bioinformatics , 2005 .

[9]  R. Barrangou,et al.  CRISPR Provides Acquired Resistance Against Viruses in Prokaryotes , 2007, Science.

[10]  Stefan Niemann,et al.  Genotyping of Genetically Monomorphic Bacteria: DNA Sequencing in Mycobacterium tuberculosis Highlights the Limitations of Current Methodologies , 2009, PloS one.

[11]  M. Behr,et al.  Microevolution of the Direct Repeat Region of Mycobacterium tuberculosis: Implications for Interpretation of Spoligotyping Data , 2002, Journal of Clinical Microbiology.

[12]  Kristin P. Bennett,et al.  A conformal Bayesian network for classification of Mycobacterium tuberculosis complex lineages , 2010, BMC Bioinformatics.

[13]  Leen Rigouts,et al.  Mycobacterium tuberculosis complex genetic diversity: mining the fourth international spoligotyping database (SpolDB4) for classification, population genetics and epidemiology , 2006, BMC Microbiology.

[14]  J. García-Martínez,et al.  Intervening Sequences of Regularly Spaced Prokaryotic Repeats Derive from Foreign Genetic Elements , 2005, Journal of Molecular Evolution.

[15]  Rodolphe Barrangou,et al.  Novel Virulence Gene and Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) Multilocus Sequence Typing Scheme for Subtyping of the Major Serovars of Salmonella enterica subsp. enterica , 2011, Applied and Environmental Microbiology.

[16]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[17]  Steven W Graves,et al.  A rapid multiplex assay for nucleic acid-based diagnostics. , 2010, Journal of microbiological methods.

[18]  Falk Hildebrand,et al.  Origin, Spread and Demography of the Mycobacterium tuberculosis Complex , 2008, PLoS pathogens.

[19]  Ying Zhang,et al.  pncA Mutations as a Major Mechanism of Pyrazinamide Resistance in Mycobacterium tuberculosis: Spread of a Monoresistant Strain in Quebec, Canada , 2000, Antimicrobial Agents and Chemotherapy.

[20]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[21]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[22]  N Rastogi,et al.  Genetic Diversity of Mycobacterium tuberculosis in Sicily Based on Spoligotyping and Variable Number of Tandem DNA Repeats and Comparison with a Spoligotyping Database for Population-Based Analysis , 2001, Journal of Clinical Microbiology.

[23]  Marc Sebban,et al.  A data-mining approach to spacer oligonucleotide typing of Mycobacterium tuberculosis , 2002, Bioinform..

[24]  E. Legrand,et al.  Tuberculosis in the Caribbean: using spacer oligonucleotide typing to understand strain origin and transmission. , 1999, Emerging infectious diseases.

[25]  Guislaine Refregier,et al.  Resolving lineage assignation on Mycobacterium tuberculosis clinical isolates classified by spoligotyping with a new high-throughput 3R SNPs based method. , 2010, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[26]  Stefan Niemann,et al.  High Functional Diversity in Mycobacterium tuberculosis Driven by Genetic Drift and Human Demography , 2008, PLoS biology.

[27]  J T Douglas,et al.  Predominance of a single genotype of Mycobacterium tuberculosis in countries of east Asia , 1995, Journal of clinical microbiology.

[28]  Maria Laura Boschiroli,et al.  Mycobacterium tuberculosis complex CRISPR genotyping: improving efficiency, throughput and discriminative power of 'spoligotyping' with new spacers and a microbead-based hybridization assay. , 2010, Journal of medical microbiology.

[29]  S T Cole,et al.  Comparative genomics of the mycobacteria. , 2000, International journal of medical microbiology : IJMM.

[30]  T. Whittam,et al.  Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[31]  M. Mézard,et al.  Information, Physics, and Computation , 2009 .

[32]  L. Schouls,et al.  Identification of a novel family of sequence repeats among prokaryotes. , 2002, Omics : a journal of integrative biology.

[33]  David Posada,et al.  MODELTEST: testing the model of DNA substitution , 1998, Bioinform..

[34]  R. Brosch,et al.  Ancient Origin and Gene Mosaicism of the Progenitor of Mycobacterium tuberculosis , 2005, PLoS pathogens.

[35]  Nalin Rastogi,et al.  Assessment of Mycobacterial Interspersed Repetitive Unit-QUB Markers To Further Discriminate the Beijing Genotype in a Population-Based Study of the Genetic Diversity of Mycobacterium tuberculosis Clinical Isolates from Okinawa, Ryukyu Islands, Japan , 2007, Journal of Clinical Microbiology.

[36]  D van Soolingen,et al.  Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology , 1997, Journal of clinical microbiology.

[37]  D van Soolingen,et al.  Comparison of various repetitive DNA elements as genetic markers for strain differentiation and epidemiology of Mycobacterium tuberculosis , 1993, Journal of clinical microbiology.

[38]  Nalin Rastogi,et al.  Evolution and Diversity of Clonal Bacteria: The Paradigm of Mycobacterium tuberculosis , 2008, PloS one.

[39]  John L. Johnson,et al.  Influence of M. tuberculosis Lineage Variability within a Clinical Trial for Pulmonary Tuberculosis , 2010, PloS one.

[40]  H. Ellegren Microsatellites: simple sequences with complex evolution , 2004, Nature Reviews Genetics.

[41]  Nalin Rastogi,et al.  Global Distribution of Mycobacterium tuberculosis Spoligotypes , 2002, Emerging infectious diseases.

[42]  Jian Zhang,et al.  First Insight into Genetic Diversity of the Mycobacterium tuberculosis Complex in Albania Obtained by Multilocus Variable-Number Tandem-Repeat Analysis and Spoligotyping Reveals the Presence of Beijing Multidrug-Resistant Isolates , 2009, Journal of Clinical Microbiology.

[43]  G. Vergnaud,et al.  CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. , 2005, Microbiology.

[44]  V. Kunin,et al.  CRISPR — a widespread system that provides acquired resistance against phages in bacteria and archaea , 2008, Nature Reviews Microbiology.

[45]  Andrew R. Francis,et al.  Models of deletion for visualizing bacterial variation: an application to tuberculosis spoligotypes , 2008, BMC Bioinformatics.

[46]  Kovalev Sy,et al.  Genetic analysis of mycobacterium tuberculosis strains isolated in Ural region, Russian Federation, by MIRU-VNTR genotyping. , 2005 .

[47]  Stefan Niemann,et al.  Variable host-pathogen compatibility in Mycobacterium tuberculosis. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Mark Spigelman,et al.  Tuberculosis in Dr Granville's mummy: a molecular re-examination of the earliest known Egyptian mummy to be scientifically examined and given a medical diagnosis , 2009, Proceedings of the Royal Society B: Biological Sciences.

[49]  S. Niemann,et al.  The Species Mycobacterium africanum in the Light of New Molecular Markers , 2004, Journal of Clinical Microbiology.

[50]  Philippe Horvath,et al.  The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA , 2010, Nature.

[51]  Barry Kreiswirth,et al.  Identifying Mycobacterium tuberculosis complex strain families using spoligotypes. , 2006, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.