‘Going wrong with confidence’: misleading sequence analyses of CiaB and ClpX
暂无分享,去创建一个
In a recent MicroCorrespondence, Kyrpides and Ouzonis (1999, Mol Microbiol 32 : 886) emphasized the perils of misleading and erroneous whole-genome sequence annotations. Unfortunately, this problem is not limited to wholegenome annotation, but also occurs at the level of individual sequences. We should like to comment on two recent examples of this, one of which, ironically, appeared in the same issue as the article by Kyrpides and Ouzonis. We read with excitement of the work by Konkel et al. (1999, Mol Microbiol 32: 691), showing the importance of the ciaB gene and its product in the internalization of Campylobacter jejuni into cultured mammalian cells. However, their claim, based on `direct sequence comparisons', that the deduced amino acid sequence of CiaB (Cj0914c in the Sanger CDS notation for the genome: http://www.sanger.ac.uk/Projects/C_jejuni/) is similar to those of SipB, IpaB and YopB is spurious and misleading, and, furthermore, leads them to speculate that C. jejuni might contain a type III secretion system, when BLAST searches of the genome sequence (http://www.sanger.ac.uk/Projects/ C_jejuni/BLAST_server.shtml, http://www.ncbi.nlm.nih. gov/BLAST/un®nishedgenome.html) would have shown them that it does not. Similarly, in a paper describing experimental evidence as to which portion of ClpX (a molecular chaperone and the regulatory subunit of the ClpXP protease) mediates binding speci®city, Levchenko et al. (1997, Cell 91: 939±947) make a similarly unjusti®ed claim that the C-terminal portion of ClpX contains two repeats homologous to PDZ domains, based on `pairwise comparisons between the PDZ domains and individual Clp/Hsp100 sequences'. Both of these claims illustrate the problem that arises when investigators compare selected sequences one with another to ®t preconceived notions derived from laboratory ®ndings. In pairwise comparisons, any sequence can be aligned with any other sequence given suf®cient gaps and lax-enough de®nitions of sequence similarity. The levels of sequence identity reported by Konkel et al. (16.8±20%) fall well into the twilight zone (Doolittle, 1986, Of URFs and ORFS, Oxford: Oxford University Press) and Levchenko et al. admit that their alignments have low statistical signi®cance, although they do not report any ®gures. Claims that sequences are suf®ciently similar to be considered homologous (i.e. have arisen through divergent evolution from a common ancestor) should only be made after unprejudiced searches of sequence databases using sequence analysis methods that provide estimates of the statistical signi®cance of sequence similarities (e.g. FASTA, BLAST: Altschul et al., 1990, J Mol Biol 215: 403±410; Altschul et al., 1997, Nucleic Acids Res 25: 3389±3402; Pearson and Lipman, 1988, Proc Natl Acad Sci USA 85: 2444 ±2448). We have attempted and failed to validate the claims of Konkel et al. and Levchenko et al. using these approaches. Furthermore, not only does direct comparison of Cj0914c with SipB using the GCG GAP program with the randomization option (Wisconsin Package Version 9.1, Genetics Computer Group (GCG), Madison, WI) show that the pairwise matches found are not statistically signi®cant, but when SipB is searched, using BLASTP, against all protein sequences encoded by the C. jejuni genome, Cj0914c does not rank anywhere in the 31 reported hits. We must therefore conclude that there is no evidence from sequence analysis to link CiaB to SipB/IpaB/ YopB, nor to support the presence of PDZ domains in ClpX. It is worth noting that neither paper describes, among experimental procedures, any of the methods used to perform or assess the signi®cance of sequence analyses. We urge that authors, editors, referees and readers of papers should, in future, demand the same rigour in the description, performance and assessment of sequence analysis as they do currently for bench-based research. The alternative to this is the propagation in the literature of misleading, erroneous and spurious functional assignments, as has unfortunately already occurred with the ®ctional PDZ domains of ClpX (Feng et al., 1998, Curr Biol 8: R464 ± 467; Spiess et al., 1999, Cell 97: 339±347). The methods and results of our analyses are presented in an online supplement on the Molecular Microbiology web site (http://www.blackwell-science.com/mmi). Mark Pallen,* Brendan Wren and Julian Parkhill Department of Medical Microbiology, St Bartholomew's and the Royal London School of Medicine and Dentistry, London EC1A 7BE, UK. The Pathogen Group, The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. *For correspondence. E-mail m.pallen@qmw.ac.uk; Tel. (44) 171 601 8414; Fax (44) 171 601 8409. Received 30 June, 1999; accepted 1 July, 1999. Molecular Microbiology (1999) 34(1), 195