Towards more robust methods of alien gene detection

Because the properties of horizontally-transferred genes will reflect the mutational proclivities of their donor genomes, they often show atypical compositional properties relative to native genes. Parametric methods use these discrepancies to identify bacterial genes recently acquired by horizontal transfer. However, compositional patterns of native genes vary stochastically, leaving no clear boundary between typical and atypical genes. As a result, while strongly atypical genes are readily identified as alien, genes of ambiguous character are poorly classified when a single threshold separates typical and atypical genes. This limitation affects all parametric methods that examine genes independently, and escaping it requires the use of additional genomic information. We propose that the performance of all parametric methods can be improved by using a multiple-threshold approach. First, strongly atypical alien genes and strongly typical native genes would be identified using conservative thresholds. Genes with ambiguous compositional features would then be classified by examining gene context, including the class (native or alien) of flanking genes. By including additional genomic information in a multiple-threshold framework, we observed a remarkable improvement in the performance of several popular, but algorithmically distinct, methods for alien gene detection.

[1]  H. Matsuda,et al.  Biased biological functions of horizontally transferred genes in prokaryotic genomes , 2004, Nature Genetics.

[2]  Ulrich Dobrindt,et al.  Genomic islands in pathogenic and environmental microorganisms , 2004, Nature Reviews Microbiology.

[3]  M. Borodovsky,et al.  How to interpret an anonymous bacterial genome: machine learning approach to gene identification. , 1998, Genome research.

[4]  Raghunath Chatterjee,et al.  On detection and assessment of statistical significance of Genomic Islands , 2008, BMC Genomics.

[5]  H. Ochman,et al.  Lateral gene transfer and the nature of bacterial innovation , 2000, Nature.

[6]  Alain Giron,et al.  Detection and characterization of horizontal transfers in prokaryotes using genomic signature , 2005, Nucleic acids research.

[7]  Fiona S. L. Brinkman,et al.  IslandViewer: an integrated interface for computational identification and visualization of genomic islands , 2009, Bioinform..

[8]  Rajeev K. Azad,et al.  Use of Artificial Genomes in Assessing Methods for Atypical Gene Detection , 2005, PLoS Comput. Biol..

[9]  Santiago Garcia-Vallvé,et al.  HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes , 2003, Nucleic Acids Res..

[10]  H. Ochman,et al.  Molecular archaeology of the Escherichia coli genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M. Ragan Detection of lateral gene transfer among microbial genomes. , 2001, Current opinion in genetics & development.

[12]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  P. Forterre,et al.  A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes , 2009, Genome Biology.

[14]  Alpan Raval,et al.  Detection of genomic islands via segmental genome heterogeneity , 2009, Nucleic acids research.

[15]  Kelly P. Williams,et al.  Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities , 2004, Nucleic Acids Res..

[16]  Aristotelis Tsirigos,et al.  A sensitive, support-vector-machine method for the detection of horizontal gene transfers in viral, archaeal and bacterial genomes , 2005, Nucleic acids research.

[17]  Meriem El Karoui,et al.  Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops , 2005, BMC Bioinformatics.

[18]  Georgios S. Vernikos,et al.  Resolving the structural features of genomic islands: a machine learning approach. , 2008, Genome research.

[19]  Rainer Merkl,et al.  SIGI: score-based identification of genomic islands , 2004, BMC Bioinformatics.

[20]  Julio Collado-Vides,et al.  A powerful non-homology method for the prediction of operons in prokaryotes , 2002, ISMB.

[21]  H. Ochman,et al.  Amelioration of Bacterial Genomes: Rates of Change and Exchange , 1997, Journal of Molecular Evolution.

[22]  H. Akaike A new look at the statistical model identification , 1974 .

[23]  S Karlin,et al.  Codon usages in different gene classes of the Escherichia coli genome , 1998, Molecular microbiology.

[24]  Ren Zhang,et al.  A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome I , 2004, Bioinform..

[25]  L. Koski,et al.  Codon bias and base composition are poor indicators of horizontally transferred genes. , 2001, Molecular biology and evolution.

[26]  W. Doolittle,et al.  Uprooting the tree of life. , 2000, Scientific American.

[27]  S. Karlin,et al.  Global dinucleotide signatures and analysis of genomic heterogeneity. , 1998, Current opinion in microbiology.

[28]  M. Ragan On surrogate methods for detecting lateral gene transfer. , 2001, FEMS microbiology letters.

[29]  Christophe Caron,et al.  MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level , 2008, BMC Bioinformatics.

[30]  R. Sandberg,et al.  Capturing whole-genome characteristics in short sequences using a naïve Bayesian classifier. , 2001, Genome research.

[31]  W. Doolittle,et al.  Prokaryotic evolution in light of gene transfer. , 2002, Molecular biology and evolution.

[32]  Jayavel Sridhar,et al.  Identification of Novel Genomic Islands Associated with Small RNAs , 2007, Silico Biol..

[33]  Howard Ochman,et al.  Reconciling the many faces of lateral gene transfer. , 2002, Trends in microbiology.

[34]  Rajeev K. Azad,et al.  Detecting laterally transferred genes: use of entropic clustering methods and genome position , 2007, Nucleic acids research.

[35]  Georgios S. Vernikos,et al.  Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands , 2006, Bioinform..

[36]  Aristotelis Tsirigos,et al.  A new computational method for the detection of horizontal gene transfer events , 2005, Nucleic acids research.

[37]  Sean D. Hooper,et al.  Detection of Genes with Atypical Nucleotide Sequence in Microbial Genomes , 2002, Journal of Molecular Evolution.