tuple_plot: Fast pairwise nucleotide sequence comparison with noise suppression

SUMMARY The program tuple_plot identifies and visualizes local similarities between two genomic sequences, typically 100 kb or longer, by applying the well-known dotplot principle. A dictionary of sequence words built from the input sequences serves to construct a task-specific expectancy model that is used to attribute significance values to pairwise word hits. The dictionary-based approach allows fast computation, the computation time scaling to O(N log N), depending on the size of the input sequences. The proposed scoring scheme appreciably increases the signal-to-noise ratio and may help to improve other word-based sequence comparison approaches. AVAILABILITY tuple_plot is available at http://genome.fli-leibniz.de/software.html and may be used under GNU public license.