论文信息 - PseqIP: A nonredundant and exhaustive protein sequence data bank generated from 4 major existing collections

PseqIP: A nonredundant and exhaustive protein sequence data bank generated from 4 major existing collections

Four major protein sequence data collections (NBRF‐PIR, PSD‐Kyoto, PGtrans, and NEWAT) have been merged into a single nonredundant data bank called PseqIP. The data bank entries were automatically matched by a heuristic computer program relying on the fast computation of the number of tetrapeptides shared by two sequences. PseqIP 1.0 includes 6,068 different protein sequences for a total of 1,357,067 residues, representing most of the available sequence information to date. During the course of this work, we found about 600 occurrences course of a protein sequence recorded with a one‐amino‐acid variation in at least two different data banks. A flat file (ASCII computer‐readable format) version of PseqIP 1.0, well‐suited for exhaustive homology searches and statistical sequence analysis, is available from our laboratory.

J M Claverie | L Bricault | J. Claverie | Laurence Bricault

[1] Jean-Michel Claverie,et al. Heuristic informational analysis of sequences , 1986, Nucleic Acids Res..

[2] Jean-Michel Claverie,et al. A common philosophy and FORTRAN 77 software package for implementing and searching sequence databases , 1984, Nucleic Acids Res..

[3] D. Lipman,et al. Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[4] W C Barker,et al. Searching the protein sequence database. , 1984, Bulletin of mathematical biology.

[5] D. Lipman,et al. Rapid and sensitive protein similarity searches. , 1985, Science.

[6] I Sauvaget,et al. Computer generation and statistical analysis of a data bank of protein sequences translated from GenBank. , 1985, Biochimie.

[7] J. P. Dumas,et al. Efficient algorithms for folding and comparing nucleic acid sequences , 1982, Nucleic Acids Res..