Interpreting the Human Genome Sequence, Using Stochastic Grammars

The 3 billion base pair sequence of the human genome is now available, and attention is focusing on annotating it to extract biological meaning. I will discuss what we have obtained, and the methods that are being used to analyse biological sequences. In particular I will discuss approaches using stochastic grammars analogous to those used in computational linguistics, both for gene finding and protein family classification.