Identification and Analysis of Palindromes for RNA Sequences

A palindrome is a string of the form S = A1 A2 or S = A1 aA2, where A1 and A2 are substrings of S and the reverse of A2 exactly matches with A1. DNA palindromes show direct influence on tumerogenesis due to the formation of DNA palindromes at microRNA genes, which involve in tumor development. Furthermore, RNA palindromes play a crucial role in genomic instability and gene amplification in some human cancers. Therefore, it is important to develop effective approaches to identify and characterize biological approximate palindromes. In this paper, we developed a simple algorithm to find all the exact and approximate palindromes up to k errors (k is specified by the user) for the two different types of RNA sequence data, i.e. mRNA sequences of fusion genes and human microRNAs (miRNAs). We confirmed that the palindromes in RNA sequences are A-U rich. According to the Kolmogorov-Smirnov test, it was found that miRNA has a frequency distribution of palindromes different from the fusion genes' mRNAs. The proposed algorithm is easy to implement, which provides an effective tool for investigating the relationship between palindromes and cancer biology.

[1]  Richard C. T. Lee,et al.  A new filtration method and a hybrid strategy for approximate string matching , 2013, Theor. Comput. Sci..

[2]  P. Neiman,et al.  Myc Oncogene-Induced Genomic Instability: DNA Palindromes in Bursal Lymphomagenesis , 2008, PLoS genetics.

[3]  S. Edelstein,et al.  Human ribosomal RNA gene arrays display a broad range of palindromic structures. , 2005, Genome research.

[4]  Valmir Carneiro Barbosa,et al.  Finding approximate palindromes in strings , 2002, Pattern Recognit..

[5]  B. Emanuel,et al.  Analysis of the t(3;8) of hereditary renal cell carcinoma: a palindrome-mediated translocation. , 2014, Cancer genetics.

[6]  V. Chvátal,et al.  Longest common subsequences of two random sequences , 1975, Advances in Applied Probability.

[7]  H. Jia,et al.  The human genome-wide distribution of DNA palindromes , 2007, Functional & Integrative Genomics.

[8]  Yu-Hsuan Liu,et al.  FARE-CAFE: a database of functional and regulatory elements of cancer-associated fusion events , 2015, Database J. Biol. Databases Curation.

[9]  Kuan-Yu Chen,et al.  Finding All Approximate Gapped Palindromes , 2009, ISAAC.

[10]  Richard C. T. Lee,et al.  Introduction to the Design and Analysis of Algorithms , 2005 .

[11]  GAP-Seq: a method for identification of DNA palindromes , 2014, BMC Genomics.

[12]  Jeong-An Gim,et al.  Genome-Wide Identification and Classification of MicroRNAs Derived from Repetitive Elements , 2014, Genomics & informatics.

[13]  B. Emanuel,et al.  Palindrome-mediated chromosomal translocations in humans. , 2006, DNA repair.

[14]  B. Morrow,et al.  AT-rich palindromes mediate the constitutional t(11;22) translocation. , 2001, American journal of human genetics.

[15]  Ayumi Shinohara,et al.  Efficient algorithms to compute compressed longest common substrings and compressed palindromes , 2009, Theor. Comput. Sci..

[16]  F. J. Novo,et al.  TICdb: a collection of gene-mapped translocation breakpoints in cancer , 2007, BMC Genomics.

[17]  Hongde Liu,et al.  MicroRNA Genes Derived from Repetitive Elements and Expanded by Segmental Duplication Events in Mammalian Genomes , 2011, PloS one.