An RNA pattern matching program with enhanced performance and portability

The identification of RNA genes in DNA sequences generally involves searching the sequence or database for consensus nucleotides. However, specific base pairing patterns rather than sequences provide a better characterization of an increasing number of functional RNA molecules. Computer programs that automatically recognize higher-order structural motifs not only facilitate the identification of RNA genes, but also find an important application in the refinement of RNA structure descriptors, an enlightening task which involves the inference of essential structural elements associated with a molecular function. We present here a signicant improvement of the program RNAMOT which allows searches of primary and secondary structural patterns in sequence databases (Gautheret et al., 1990). An important performance enhancement was achieved using a faster string-matching algorithm and more efficient sequence scans. RNAMOT can now perform complete GenBank searches for RNA motifs in a few hours. Other enhancements include an automatic determination of the optimal search order for structural motifs, the handling of sequences of virtually unlimited length and a full implementation of the IUPAC/IUB codes in either target sequences or RNA descriptors. RNAMOT is written in the ANSI C language and compiles on any workstation or personal computer. The memory requirement depends on the largest sequence used in the search (~ 3 Mbytes for a GenBank search); CPU time depends on the frequency of pattern elements in the target sequence.