Automating and streamlining inference of function of plant ESTs within a data analysis system

Automated sequencing of partial cDNA clones as expressed sequence tags (ESTs) and similarity searches of these ESTs against public DNA and protein sequence databanks is becoming a well-accepted method for identifying genes of any given species. The use of local similarity search algorithms against the public databanks has been preferred as a screening mechanism for the short EST sequences, because they determine scores for only those regions which are conserved between sequences. Furthermore, local similarities of translated EST sequences to the public protein databank sequences can often be detected when similarities to the public DNA databanks appear coincidental. Both human brain and Caenorhabditis elegans cDNA EST projects have used similarity searches for positive identification of known genes and determination of putative function of others. Possible novel genes make up those ESTs which did not show significant similarity to the public databases. The authors show that shorter ESTs are sufficient to meet EST project objectives for plant genomes.<<ETX>>