LCR_Finder: A de Novo Low Copy Repeat Finder for Human Genome

Low copy repeats (LCRs) are reported to trigger and mediate genomic rearrangements and may result in genetic diseases. The detection of LCRs provides help to interrogate the mechanism of genetic diseases. The complex structures of LCRs render existing genomic structural variation (SV) detection and segmental duplication (SD) tools hard to predict LCR copies in full length especially those LCRs with complex SVs involved or in large scale. We developed a de novo computational tool LCR_Finder that can predict large scale (>100Kb) complex LCRs in a human genome. Technical speaking, by exploiting fast read alignment tools, LCR_Finder first generates overlapping reads from the given genome, aligns reads back to the genome to identify potential repeat regions based on multiple mapping locations. By clustering and extending these regions, we predict potential complex LCRs. We evaluated LCR_Finder on human chromosomes, we are able to identify 4 known disease related LCRs, and predict a few more possible novel LCRs. We also showed that existing tools designed for finding repeats in a genome, such RepeatScout and WindowMasker are not able to identify LCRs and tools designed for detecting SDs also cannot report large scale full length complex LCRs.

[1]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[2]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[3]  P. Stankiewicz,et al.  Sotos syndrome common deletion is mediated by directly oriented subunits within inverted Sos-REP low-copy repeats. , 2005, Human molecular genetics.

[4]  A PevznerPavel,et al.  De novo identification of repeat families in large genomes , 2005 .

[5]  B. Birren,et al.  Structure and evolution of the Smith-Magenis syndrome repeat gene clusters, SMS-REPs. , 2002, Genome research.

[6]  P. Stankiewicz,et al.  Hominoid lineage specific amplification of low-copy repeats on 22q11.2 (LCR22s) associated with velo-cardio-facial/digeorge syndrome. , 2007, Human molecular genetics.

[7]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[8]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[9]  M. C. Valero,et al.  Fine-scale comparative mapping of the human 7q11.23 region and the orthologous region on mouse chromosome 5G: the low-copy repeats that flank the Williams-Beuren syndrome deletion arose at breakpoint sites of an evolutionary inversion(s). , 2000, Genomics.

[10]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[11]  J. Weissenbach,et al.  Characterization of the NPHP1 locus: mutational mechanism involved in deletions in familial juvenile nephronophthisis. , 2000, American journal of human genetics.

[12]  E. Eichler,et al.  Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution , 2007, Nature Genetics.

[13]  P. Stankiewicz,et al.  Genome architecture, rearrangements and genomic disorders. , 2002, Trends in genetics : TIG.

[14]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[15]  D. Haussler,et al.  Integration of cytogenetic landmarks into the draft sequence of the human genome , 2001, Nature.

[16]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .