Multigenic families and proteomics: Extended protein characterization as a tool for paralog gene identification

In classical proteomic studies, the searches in protein databases lead mostly to the identification of protein functions by homology due to the non‐exhaustiveness of the protein databases. The quality of the identification depends on the studied organism, its complexity and its representation in the protein databases. Nevertheless, this basic function identification is insufficient for certain applications namely for the development of RNA‐based gene‐silencing strategies, commonly termed RNA interference (RNAi) in animals and post‐transcriptional gene silencing (PTGS) in plants, that require an unambiguous identification of the targeted gene sequence. A PTGS strategy was considered in the study of the infection of Oryza sativa by the Rice Yellow Mottle Virus (RYMV). It is suspected that the RYMV recruits host proteins after its entry into plant cells to form a complex facilitating virus multiplication and spreading. The protein partners of this complex were identified by a classical proteomic approach, nano liquid chromatography tandem mass spectrometry. Among the identified proteins, several were retained for a PTGS strategy. Nevertheless most of the protein candidates appear to be members of multigenic families for which all paralog genes are not present in protein databases. Thus the identification of the real expressed paralog gene with classical protein database searches is impossible. Consequently, as the genome contains all genes and thus all paralog genes, a whole genome search strategy was developed to determine the specific expressed paralog gene. With this approach, the identification of peptides matching only a single gene, called discriminant peptides, allows definitive proof of the expression of this identified gene. This strategy has several requirements: (i) a genome completely sequenced and accessible; (ii) high protein sequence coverage. In the present work, through three examples, we report and validate for the first time a genome database search strategy to specifically identify paralog genes belonging to multigenic families expressed under specific conditions.

[1]  Marc Choisy,et al.  Inferring the Evolutionary History of Rice Yellow Mottle Virus from Genomic, Phylogenetic, and Phylogeographic Studies , 2004, Journal of Virology.

[2]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[3]  D. Baulcombe Fast forward genetics based on virus-induced gene silencing. , 1999, Current opinion in plant biology.

[4]  C. Watanabe,et al.  Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Jacob D. Jaffe,et al.  Proteogenomic mapping as a complementary method to perform genome annotation , 2004, Proteomics.

[6]  M. Wassenegger,et al.  A model for RNA-mediated gene silencing in higher plants , 1998, Plant Molecular Biology.

[7]  Kenneth H. Wolfe,et al.  Gene Duplication and Gene Conversion in the Caenorhabditis elegans Genome , 1999, Journal of Molecular Evolution.

[8]  T. Hunkapiller,et al.  Peptide mass maps: a highly informative approach to protein identification. , 1993, Analytical biochemistry.

[9]  Yuan Liu,et al.  MULTICLUSTAL: a systematic method for surveying Clustal W alignment parameters , 1999, Bioinform..

[10]  P. Waterhouse,et al.  Application of gene silencing in plants. , 2002, Current opinion in plant biology.

[11]  P. Mortensen,et al.  Mass spectrometry allows direct identification of proteins in large genomes , 2001, Proteomics.

[12]  E. D. Earle,et al.  Nuclear DNA content of some important plant species , 1991, Plant Molecular Biology Reporter.

[13]  John R Yates,et al.  Multidimensional separations for protein/peptide analysis in the post-genomic era. , 2002, BioTechniques.

[14]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[15]  Yujun Zhang,et al.  Sequence and analysis of rice chromosome 4 , 2002, Nature.

[16]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[17]  P. Waterhouse,et al.  Construct design for efficient, effective and high-throughput gene silencing in plants. , 2001, The Plant journal : for cell and molecular biology.

[18]  Jia Liu,et al.  The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists , 2003, Nucleic Acids Res..

[19]  Jianzhi Zhang Evolution by gene duplication: an update , 2003 .

[20]  About the mechanism of interference of silver staining with peptide mass spectrometry , 2004, Proteomics.

[21]  E. Devor,et al.  Molecular and Temporal Characteristics of Human Retropseudogenes , 2003, Human biology.

[22]  M. Montagu,et al.  Post-transcriptional gene silencing in plants. , 1997, Current opinion in cell biology.

[23]  M. Yeager,et al.  Stability of rice yellow mottle virus and cellular compartmentalization during the infection process in Oryza sativa (L.). , 2002, Virology.

[24]  F. Delalande,et al.  Proteome analysis of cultivar‐specific deregulations of Oryza sativa indica and O. sativa japonica cellular suspensions undergoing Rice yellow mottle virus infection , 2004, Proteomics.

[25]  K. H. Wolfe,et al.  Molecular evidence for an ancient duplication of the entire yeast genome , 1997, Nature.

[26]  F. B. Pickett,et al.  Seeing double: appreciating genetic redundancy. , 1995, The Plant cell.

[27]  M. Mann,et al.  Use of mass spectrometry-derived data to annotate nucleotide and protein sequence databases. , 2001, Trends in biochemical sciences.

[28]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[29]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[30]  D. Landsman,et al.  Retropseudogenes for human chromosomal protein HMG-17. , 1987, Journal of molecular biology.

[31]  Barry Moore,et al.  Genome-based peptide fingerprint scanning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[33]  Brandon S. Gaut,et al.  Evolution of genes and taxa: a primer , 2004, Plant Molecular Biology.

[34]  M. Yeager,et al.  Structure of native and expanded sobemoviruses by electron cryo-microscopy and image reconstruction. , 2000, Journal of molecular biology.

[35]  J. Mol,et al.  Inhibition of flower pigmentation by antisense CHS genes: promoter and minimal sequence requirements for the antisense effect , 1990, Plant Molecular Biology.

[36]  J. Choudhary,et al.  Interrogating the human genome using uninterpreted mass spectrometry data , 2001, Proteomics.

[37]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[38]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[39]  P. Oeller,et al.  Inverted repeat of a heterologous 3'-untranslated region for high-efficiency, high-throughput gene silencing. , 2003, The Plant journal : for cell and molecular biology.

[40]  H. Hibino,et al.  Biology and epidemiology of rice viruses. , 1996, Annual review of phytopathology.

[41]  W. Blackstock,et al.  Matching peptide mass spectra to EST and genomic DNA databases. , 2001, Trends in biotechnology.

[42]  Nevin D. Young,et al.  OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies , 2003, BMC Bioinformatics.

[43]  Kim Rutherford,et al.  Artemis: sequence visualization and annotation , 2000, Bioinform..

[44]  T. Gojobori,et al.  The genome sequence and structure of rice chromosome 1 , 2002, Nature.

[45]  M. Stam,et al.  Review Article: The Silence of Genes in Transgenic Plants , 1997 .

[46]  W. Bakker Three new beetle vectors of rice yellow mottle virus in Kenya , 1971, Netherlands Journal of Plant Pathology.