Correspondence discriminant analysis: a multivariate method for comparing classes of protein and nucleic acid sequences

This report describes two applications of a multivariate method for studying classes of nucleotide or protein sequences: correspondence discriminant analysis (CDA). The first example is the discrimination between Escherichia coli proteins according to their subcellular location (membrane, cytoplasm and periplasm). The high resolution of the method made it possible to predict the subcellular location of E.coli proteins for whom this information is not known. The second example is discrimination between the coding sequences of leading and lagging strands in four bacteria: Mycoplasma genitalium, Haemophilus influenzae, E.coli and Bacillus subtilis. The programs used for computing the analysis are integrated in a publicly available package that runs on MacOS 7.x or Windows 95 operating systems (http:/(/)biomserv.univ-lyonl.fr/ADE-4.html). These programs are also accessible through our World Wide Web server (http:/(/)biomserv.univ-lyonl.fr/Net Mul.html).

[1]  M. Kanehisa,et al.  Prediction of protein function from sequence properties. Discriminant analysis of a data base. , 1984, Biochimica et biophysica acta.

[2]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[3]  M. Kanehisa,et al.  Prediction of splice junctions in mRNA sequences. , 1985, Nucleic acids research.

[4]  C DeLisi,et al.  The detection and classification of membrane-spanning proteins. , 1985, Biochimica et biophysica acta.

[5]  T. Steitz,et al.  Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. , 1986, Annual review of biophysics and biophysical chemistry.

[6]  B W Glickman,et al.  Mechanisms of ultraviolet-induced mutation. Mutational spectra in the Escherichia coli lacI gene for a wild-type and an excision-repair-deficient strain. , 1987, Journal of molecular biology.

[7]  Y Iida,et al.  Categorical discriminant analysis of 3'-splice site signals of mRNA precursors in higher eukaryote genes. , 1988, Journal of theoretical biology.

[8]  M Kanehisa A multivariate analysis method for discriminating protein secondary structural segments. , 1988, Protein engineering.

[9]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[10]  Nobuyuki Fujita,et al.  Systematic sequencing of the Escherichia coli genome: analysis of the 0- 2.4 min region , 1992, Nucleic Acids Res..

[11]  F. Blattner,et al.  Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. , 1992, Science.

[12]  F. Blattner,et al.  Analysis of the Escherichia coli genome. IV. DNA sequence of the region from 89.2 to 92.8 minutes. , 1993, Nucleic acids research.

[13]  F. Blattner,et al.  DNA sequence and analysis of 136 kilobases of the Escherichia coli genome: organizational symmetry around the origin of replication. , 1993, Genomics.

[14]  F. Blattner,et al.  Analysis of the Escherichia coli genome. III. DNA sequence of the region from 87.2 to 89.2 minutes. , 1993, Nucleic acids research.

[15]  V. Solovyev,et al.  Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. , 1994, Nucleic acids research.

[16]  Heidi J. Sofia,et al.  Analysis of the Escherichia coli genome. V. DNA sequence of the region from 76.0 to 81.5 minutes , 1993, Nucleic Acids Res..

[17]  C. Gautier,et al.  Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. , 1994, Nucleic acids research.

[18]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[19]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[20]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[21]  Jean Thioulouse ADE SOFTWARE: MULTIVARIATE ANALYSIS AND GRAPHICAL DISPLAY OF ENVIRONMENTAL DATA , 1995 .

[22]  J. Lobry Asymmetric substitution patterns in the two DNA strands of bacteria. , 1996, Molecular biology and evolution.

[23]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its new supplement TREMBL , 1996, Nucleic Acids Res..

[24]  Jean Thioulouse,et al.  NetMul, a World-Wide Web user interface for multivariate analysis software , 1996 .

[25]  H. Ochman,et al.  Asymmetries Generated by Transcription-Coupled Repair in Enterobacterial Genes , 1996, Science.

[26]  Guy Perrière,et al.  NRSub: a non-redundant database for Bacillus subtilis , 1996, Nucleic Acids Res..