Production of full-length cDNA sequences by sequencing and analysis of expressed sequence tags from Schistosoma mansoni.

The number of sequences generated by genome projects has increased exponentially, but gene characterization has not followed at the same rate. Sequencing and analysis of full-length cDNAs is an important step in gene characterization that has been used nowadays by several research groups. In this work, we have selected Schistosoma mansoni clones for full-length sequencing, using an algorithm that investigates the presence of the initial methionine in the parasite sequence based on the positions of alignment start between two sequences. BLAST searches to produce such alignments have been performed using parasite expressed sequence tags produced by Minas Gerais Genome Network against sequences from the database Eukaryotic Cluster of Orthologous Groups (KOG). This procedure has allowed the selection of clones representing 398 proteins which have not been deposited as S. mansoni complete CDS in any public database. Dedicated sequencing of 96 of such clones with reads from both 5' and 3' ends has been performed. These reads have been assembled using PHRAP, resulting in the production of 33 full-length sequences that represent novel S. mansoni proteins. These results shall contribute to construct a more complete view of the biology of this important parasite.

[1]  Tetsuo Nishikawa,et al.  Assessing protein coding region integrity in cDNA sequencing projects , 1998, Bioinform..

[2]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[3]  Feng Liu,et al.  Evolutionary and biomedical implications of a Schistosoma japonicum complementary DNA resource , 2003, Nature Genetics.

[4]  G. Rubin,et al.  Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[6]  O. Griffith,et al.  Systematic recovery and analysis of full-ORF human cDNA clones. , 2004, Genome research.

[7]  Anders Gorm Pedersen,et al.  RevTrans: multiple alignment of coding DNA from aligned amino acid sequences , 2003, Nucleic Acids Res..

[8]  Coral del Val,et al.  cDNA2Genome: A tool for mapping and annotating cDNAs , 2003, BMC Bioinformatics.

[9]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[10]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[11]  Peter Ernst,et al.  ESTAnnotator: a tool for high throughput EST annotation , 2003, Nucleic Acids Res..

[12]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[13]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[14]  Tetsuo Nishikawa,et al.  Prediction whether a human cDNA sequence contains initiation codon by combining statistical information and similarity with protein sequences , 2000, Bioinform..

[15]  Ana Tereza Ribeiro de Vasconcelos,et al.  The complete genome sequence of Chromobacterium violaceum reveals remarkable and exploitable bacterial adaptability , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Yoshihide Hayashizaki,et al.  CDS annotation in full-length cDNA sequence. , 2003, Genome research.

[17]  L. Kisselev,et al.  Termination of translation in eukaryotes. , 1995, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[18]  Benjamin F. Cravatt,et al.  Assignment of protein function in the postgenomic era , 2005 .

[19]  J. Pelletier,et al.  Full-length cDNAs: more than just reaching the ends. , 2001, Physiological genomics.