Further progress towards a catalogue of all Arabidopsis genes: analysis of a set of 5000 non-redundant ESTs.

Nearly 7000 Arabidopsis thaliana-expressed sequence tags (ESTs) from 10 cDNA libraries have been sequenced, of which almost 5000 non-redundant tags have been submitted to the EMBL data bank. The quality of the cDNA libraries used is analysed. Similarity searches in international protein data banks have allowed the detection of significant similarities to a wide range of proteins from many organisms. Alignment with ESTs from the rice systematic sequencing project has allowed the detection of amino acid motifs which are conserved between the two organisms, thus identifying tags to genes encoding highly conserved proteins. These genes are candidates for a common framework in genome mapping projects in different plants.