What Are All Those Funny Symbols in a Blast Printout? Blast = Basic Local Alignment Search Tool

„ The aim in BLAST is to see if a " query " sequences significantly matches some part(s) of a (large) data base, for example that at NCBI. „ To introduce the concepts we start with a simple example. Do the two DNA sequences on the next slide show significant evidence of matching? „ (Matches are denoted by downward arrows.) p p p p p p p p p p p g g a g a c t g t a g a c a g c t a a t g c t a t a g a a c g c c c t a g c c a c g a g c c c t t a t c We operate statistically. That is, we set up a null hypothesis (that the two sequences were generated at random with respect to each other), and an alternative hypothesis (that in some sense there is a similarity between them). We will assess the acceptability of the null hypothesis by calculating a P-value. To simplify the presentation, we assume for the moment that each nucleotide arises at any site with probability ¼. (The full theory relaxes this assumption.) We could do a global test. The null hypothesis probability of a match at any site is ¼. Doing this would lead to a test involving the binomial distribution – did we get significantly more matches than expected by chance, using binomial distribution calculations? However, we want a " local " , not a " global " test, for reasons to be discussed later.