Locating tandem repeats in weighted sequences in proteins

A weighted biological sequence is a string in which a set of characters may appear at each position with respective probabilities of occurrence. We attempt to locate all the tandem repeats in a weighted sequence. A repeated substring is called a tandem repeat if each occurrence of the substring is directly adjacent to each other. By introducing the idea of equivalence classes in weighted sequences, we identify the tandem repeats of every possible length using an iterative partitioning technique. We also present the algorithm for recording the tandem repeats, and prove that the problem can be solved in O(n2) time.

[1]  Jing Fan,et al.  Loose and strict repeats in weighted sequences of proteins. , 2010, Protein and peptide letters.

[2]  Xing-Ming Zhao,et al.  Classifying protein sequences using hydropathy blocks , 2006, Pattern Recognit..

[3]  Costas S. Iliopoulos,et al.  Computation of Repetitions and Regularities of Biologically Weighted Sequences , 2006, J. Comput. Biol..

[4]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[5]  Costas S. Iliopoulos,et al.  Computing the Repetitions in a Weighted Sequence , 2003, Prague Stringology Conference.

[6]  Jens Stoye,et al.  Simple and flexible detection of contiguous repeats using a suffix tree , 2002, Theor. Comput. Sci..

[7]  Frantisek Franek,et al.  Computing all Repeats Using Suffix Arrays , 2003, J. Autom. Lang. Comb..

[8]  Costas S. Iliopoulos,et al.  Searching for Regularities in Weighted Sequences , 2019, International Conference of Computational Methods in Sciences and Engineering 2004 (ICCMSE 2004).

[9]  Franco P. Preparata,et al.  Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[10]  Costas S. Iliopoulos,et al.  Efficient Algorithms for Handling Molecular Weighted Sequences , 2004, IFIP TCS.

[11]  Jens Stoye,et al.  Simple and flexible detection of contiguous repeats using a suffix tree , 1998, Theor. Comput. Sci..

[12]  Wei Jia,et al.  Robust Classification Method of Tumor Subtype by Using Correlation Filters , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Nick Pacf,et al.  Protein and peptide letters: editors Ben Dunn and Laurence Pearl, Bentham Science Publishers B.V., $60.00 (individual); $155.00 (institutional) , 1995 .

[14]  BMC Bioinformatics , 2005 .

[15]  D. Peakall,et al.  The Human Genome Project (HGP). , 2002, Ecotoxicology.

[16]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[17]  P. Patel,et al.  Friedreich's Ataxia: Autosomal Recessive Disease Caused by an Intronic GAA Triplet Repeat Expansion , 1996, Science.

[18]  S. Ohno Repeats of base oligomers as the primordial coding sequences of the primeval earth and their vestiges in modern genes , 2005, Journal of Molecular Evolution.

[19]  Roberto Grossi,et al.  Suffix trees and their applications in string algorithms , 1993 .

[20]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[21]  Christoph Mayer,et al.  Genome-wide analysis of tandem repeats in Daphnia pulex - a comparative approach , 2010, BMC Genomics.

[22]  Ruoming Jin,et al.  Using gene co-expression network analysis to predict biomarkers for chronic lymphocytic leukemia , 2010, BMC Bioinformatics.

[23]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[24]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[25]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[26]  Lei Zhang,et al.  Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection , 2009, IEEE Transactions on Information Technology in Biomedicine.

[27]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .