From IMGT-ONTOLOGY to IMGT/LIGMotif: the IMGT® standardized approach for immunoglobulin and T cell receptor gene identification and description in large genomic sequences

BackgroundThe antigen receptors, immunoglobulins (IG) and T cell receptors (TR), are specific molecular components of the adaptive immune response of vertebrates. Their genes are organized in the genome in several loci (7 in humans) that comprise different gene types: variable (V), diversity (D), joining (J) and constant (C) genes. Synthesis of the IG and TR proteins requires rearrangements of V and J, or V, D and J genes at the DNA level, followed by the splicing at the RNA level of the rearranged V-J and V-D-J genes to C genes. Owing to the particularities of IG and TR gene structures related to these molecular mechanisms, conventional bioinformatic software and tools are not adapted to the identification and description of IG and TR genes in large genomic sequences. In order to answer that need, IMGT®, the international ImMunoGeneTics information system®, has developed IMGT/LIGMotif, a tool for IG and TR gene annotation. This tool is based on standardized rules defined in IMGT-ONTOLOGY, the first ontology in immunogenetics and immunoinformatics.ResultsIMGT/LIGMotif currently annotates human and mouse IG and TR loci in large genomic sequences. The annotation includes gene identification and orientation on DNA strand, description of the V, D and J genes by assigning IMGT® labels, gene functionality, and finally, gene delimitation and cluster assembly. IMGT/LIGMotif analyses sequences up to 2.5 megabase pairs and can analyse them in batch files.ConclusionsIMGT/LIGMotif is currently used by the IMGT® biocurators to annotate, in a first step, IG and TR genomic sequences of human and mouse in new haplotypes and those of closely related species, nonhuman primates and rat, respectively. In a next step, and following enrichment of its reference databases, IMGT/LIGMotif will be used to annotate IG and TR of more distantly related vertebrate species. IMGT/LIGMotif is available at http://www.imgt.org/ligmotif/.

[1]  M. Borodovsky,et al.  Gene identification in novel eukaryotic genomes by self-training algorithm , 2005, Nucleic acids research.

[2]  Marie-Paule Lefranc,et al.  Recovering probabilities for nucleotide trimming processes for T cell receptor TRA and TRG V-J junctions analyzed with IMGT tools , 2008, BMC Bioinformatics.

[3]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[4]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[5]  Jérôme Lane,et al.  IMGT®, the international ImMunoGeneTics information system® , 2004, Nucleic Acids Res..

[6]  Mark Borodovsky,et al.  Statistical significance in biological sequence analysis , 2006, Briefings Bioinform..

[7]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[8]  Gérard Lefranc,et al.  The Immunoglobulin FactsBook , 2001 .

[9]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[10]  Andrew M. Jenkinson,et al.  Ensembl 2009 , 2008, Nucleic Acids Res..

[11]  Gérard Lefranc,et al.  The T cell receptor factsbook , 2001 .

[12]  Mathew W. Wright,et al.  Guidelines for human gene nomenclature. , 2002, Genomics.

[13]  K. Calame,et al.  An lmmunoglobulin Heavy Chain Variable Region Gene Is Generated from Three Segments of DNA : VH , 2004 .

[14]  Stanley Letovsky,et al.  GDB: the Human Genome Database , 1998, Nucleic Acids Res..

[15]  Jérôme Lane,et al.  IMGT-Kaleidoscope, the formal IMGT-ONTOLOGY paradigm. , 2008, Biochimie.

[16]  Cyrus Chothia,et al.  Exegesis: a procedure to improve gene predictions and its use to find immunoglobulin superfamily proteins in the human and mouse genomes. , 2003, Nucleic acids research.

[17]  Marie-Paule Lefranc,et al.  WHO-IUIS Nomenclature Subcommittee for immunoglobulins and T cell receptors report , 2007, Immunogenetics.

[18]  Marie-Paule Lefranc,et al.  IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis , 2008, Nucleic Acids Res..

[19]  Marie-Paule Lefranc WHO-IUIS Nomenclature Subcommittee for immunoglobulins and T cell receptors report August 2007, 13th International Congress of Immunology, Rio de Janeiro, Brazil. , 2008, Developmental and comparative immunology.

[20]  James G. R. Gilbert,et al.  The vertebrate genome annotation (Vega) database , 2004, Nucleic Acids Res..

[21]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[22]  M. Neuberger,et al.  Somatic hypermutation: activation-induced deaminase for C/G followed by polymerase η for A/T , 2007, The Journal of experimental medicine.

[23]  Marie-Paule Lefranc,et al.  IMGT-Choreography for immunogenetics and immunoinformatics , 2004, Silico Biol..

[24]  Marie-Paule Lefranc,et al.  Ontology for immunogenetics: the IMGT-ONTOLOGY , 1999, Bioinform..

[25]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[26]  Hitoshi Sakano,et al.  Sequences at the somatic recombination sites of immunoglobulin light-chain genes , 1979, Nature.

[27]  Marie-Paule Lefranc,et al.  IMGT , the international ImMunoGeneTics information system , 2003 .

[28]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[29]  L. Hood,et al.  An immunoglobulin heavy chain variable region gene is generated from three segments of DNA: VH, D and JH , 1980, Cell.

[30]  Leroy Hood,et al.  IgG antibodies to phosphorylcholine exhibit more diversity than their IgM counterparts , 1981, Nature.

[31]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[32]  Michael R. Brent,et al.  Using Multiple Alignments to Improve Gene Prediction , 2005, RECOMB.

[33]  F. Alt,et al.  Joining of immunoglobulin heavy chain gene segments: implications from a chromosome with evidence of three D-JH fusions. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[34]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[35]  Marie-Paule Lefranc,et al.  IMGT-ONTOLOGY for immunogenetics and immunoinformatics , 2003, Silico Biol..

[36]  Patrice Duroux,et al.  IMGT/LIGM-DB, the IMGT® comprehensive database of immunoglobulin and T cell receptor nucleotide sequences , 2005, Nucleic Acids Res..