Evaluating and improving cDNA sequence quality with cQC

SUMMARY Errors are prevalent in cDNA sequences but the extent to which sequence collections differ in frequencies and types of errors has not been investigated systematically. cDNA quality control, or cQC, was developed to evaluate the quality of cDNA sequence collections and to revise those sequences that differ from a higher quality genomic sequence. After removing rRNA, vector, bacterial insertion sequence and chimeric cDNA contaminants, small-scale nucleotide discrepancies were found in 51% of cDNA sequences from one Arabidopsis cDNA collection, 89% from a second Arabidopsis collection and 75% from a rice collection. These errors created premature termination codons in 4 and 42% of cDNA sequences in the respective Arabidopsis collections and in 7% of the rice cDNA sequences.