SEALS: A System for Easy Analysis of Lots of Sequences

We present a system of programs designed to facilitate sequence analysis projects involving large amounts of data. SEALS (System for Easy Analysis of Lots of Sequences) is a logically organized set of flexible, easily modifiable research tools, designed to run on open systems. Functionality is divided into approximately 50 commands which follow consistent syntax and semantics; wrappers are also provided for commonly used sequence analysis software to effect similar syntax for these programs. SEALS includes software for retrieving sequence information, scripting database search tools such as BLAST and MoST, viewing and analyzing search outputs, searching in and processing nucleotide and protein sequences using regular expressions, and constructing rational predictions of protein features. The system is designed to provide modular elements which can be combined, modified, and integrated with other methods in order to quickly design and execute computer experiments for sequence analysis projects at the scale of whole genomes.

[1]  A Danchin,et al.  Analysis of a Bacillus subtilis genome fragment using a co-operative computer system prototype. , 1995, Gene.

[2]  E V Koonin,et al.  Sequence analysis of eukaryotic developmental proteins: ancient and novel domains. , 1996, Genetics.

[3]  Sean R. Eddy,et al.  Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[4]  E V Koonin,et al.  Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity. Application of an iterative approach to database search. , 1994, Journal of molecular biology.

[5]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[6]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[7]  E. Koonin,et al.  Protein sequence comparison at genome scale. , 1996, Methods in enzymology.

[8]  G D Schuler,et al.  A workbench for multiple alignment construction and analysis , 1991, Proteins.

[9]  A. Lupas Prediction and analysis of coiled-coil structures. , 1996, Methods in enzymology.

[10]  C. Sander,et al.  Challenging times for bioinformatics , 1995, Nature.

[11]  T Gaasterland,et al.  MAGPIE: automated genome interpretation. , 1996, Trends in genetics : TIG.

[12]  Chris Sander,et al.  GeneQuiz: A Workbench for Sequence Analysis , 1994, ISMB.

[13]  C. Chothia,et al.  Gene duplications in H. influenzae , 1995, Nature.

[14]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[15]  P. R. Sibbald,et al.  The P-loop--a common motif in ATP- and GTP-binding proteins. , 1990, Trends in biochemical sciences.

[16]  Arcady R. Mushegian,et al.  Sequencing and analysis of bacterial genomes , 1996, Current Biology.

[17]  P. Bork,et al.  Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli , 1996, Current Biology.

[18]  E. Koonin,et al.  Viral proteins containing the purine NTP-binding sequence pattern. , 1989, Nucleic acids research.

[19]  Amos Bairoch,et al.  The PROSITE database, its status in 1995 , 1996, Nucleic Acids Res..

[20]  J. Walker,et al.  Distantly related sequences in the alpha‐ and beta‐subunits of ATP synthase, myosin, kinases and other ATP‐requiring enzymes and a common nucleotide binding fold. , 1982, The EMBO journal.

[21]  C. Sander,et al.  Yeast chromosome III: new gene functions. , 1994, The EMBO journal.

[22]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[23]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[24]  C. Sander,et al.  Computational comparisons of model genomes. , 1996, Trends in biotechnology.

[25]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[26]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[27]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[28]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[29]  G. Church,et al.  More Haemophilus and Mycoplasma genes. , 1996, Science.