Detecting LTR structures in human genomic sequences using profile hidden Markov models

More than 45% of human genome has been annotated as transposable elements (TEs). The human genome is expanded by the mobilization of these TEs, which they may increase the plasticity and variation of the genome. Long terminal repeat (LTR) retrotransposons are important components in TEs. LTRs include regulatory sites, which the authors believe could be conserved in evolution. Therefore, these significant motifs in the sequence of LTRs are found and are used to train a Hidden Markov Model. These models are used as fingerprints to detect most of the known LTRs detected by RepeatMasker. LTR instances are classified into families using the predictive models proposed. These LTRs can support evolutionary analysis. A new method of detecting LTR is proposed. Analyzing LTR sequences reveals some specific motifs as LTR fingerprints, which can be built into HMM profiles. Experimental results reveal that the proposed experimental approach not only discovers most of the LTRs found by RepeatMasker, but also detects some novel LTRs. Moreover, the novel LTRs may be structurally incomplete or degenerate.

[1]  A. Smit,et al.  Identification of a new, abundant superfamily of mammalian LTR-transposons. , 1993, Nucleic acids research.

[2]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[3]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[4]  J. Jurka,et al.  Repeats in genomic DNA: mining and meaning. , 1998, Current opinion in structural biology.

[5]  S. Wessler,et al.  Genome-wide comparative analysis of the transposable elements in the related species Arabidopsis thaliana and Brassica oleracea. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  T. Werner,et al.  A novel method to develop highly specific models for regulatory units detects a new LTR in GenBank which contains a functional promoter. , 1997, Journal of molecular biology.

[7]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[8]  H. Kazazian,et al.  Mobile elements and disease. , 1998, Current opinion in genetics & development.

[9]  J. McDonald,et al.  Long terminal repeat retrotransposons of Mus musculus , 2004, Genome Biology.

[10]  J. McDonald,et al.  Long terminal repeat retrotransposons of Oryza sativa , 2002, Genome Biology.

[11]  Z. Gu,et al.  Evolutionary analyses of the human genome , 2001, Nature.

[12]  Richard M. Bruskiewich,et al.  Transposable element annotation of the rice genome , 2004, Bioinform..

[13]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[14]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[15]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..