Structural analysis of genomic sequences with matched filtering

A pattern filtering approach is developed to analyze genomic sequences in this work. With this approach, the distance of a certain pattern is first translated into a "gap sequence" consisting of integer numbers. Different patterns result in different gap sequences, and the similarity measure of two genomic sequences can be made based upon the processing of gap sequences generated by a set of pre-selected patterns. A matched filtering approach is applied to gap sequences. Furthermore, several post-processing techniques are applied to the filtered result for signal enhancement. For example, the modified Butterworth window (MBW) is used to remove the edge effect of the matched filter output, and the uncertain region is beleaguered by the advanced similarity test (AST) algorithm. The match between gap sequences is called a "frame match". The actual match of two genomic sequences demands both frame match and stuffing match. The proposed approach is useful for sequence analysis based on the frame match with desirable patterns. Extensive experimental results are presented to demonstrate the performance of the proposed method.