Computer-assisted sequencing, interval graphs, and molecular evolution.

In 1945, Fox developed the strategy for sequencing long proteins by using overlapping fragments. We show how the formal mathematical technique for the construction of interval graphs (Gilmore and Hoffman, 1964) is useful both pedagogically for understanding the underlying logic of sequencing linear molecules and is more amenable to automation because of its algorithmic nature. We also present a computer program, that employs the interval graph algorithm, which can be used to sequence proteins when given digest data. An example is given to illustrate all the steps involved in the algorithmic processing of the data. The need for such developments with respect to molecular evolution is discussed.

[1]  Derek G. Corneil,et al.  Corrections to Bierstone's Algorithm for Generating Cliques , 1972, J. ACM.

[2]  G. Allen,et al.  Sequencing of proteins and peptides , 1981 .

[3]  Marvin B. Shapiro An Algorithm for Reconstructing Protein and RNA Sequences , 1967, JACM.

[4]  R. Eck A Simplified Strategy for Sequence Analysis of Large Proteins , 1962, Nature.

[5]  T. Gingeras,et al.  Computer programs for the assembly of DNA sequences. , 1979, Nucleic acids research.

[6]  C R Merril,et al.  Reconstruction of protein and nucleic acid sequences. IV. The algebra of free monoids and the fragmentation stratagem. , 1966, The Bulletin of mathematical biophysics.

[7]  Dan F. Bradley,et al.  Automatic Determination of Amino Acid Sequences , 1963, IBM J. Res. Dev..

[8]  Joan P. Hutchinson,et al.  On Eulerian Circuits and Words with Prescribed Adjacency Patterns , 1975, J. Comb. Theory, Ser. A.

[9]  L. J. Korn,et al.  [60] Computer analysis of nucleic acids and proteins , 1980 .

[10]  C. Lekkeikerker,et al.  Representation of a finite graph by a set of intervals on the real line , 1962 .

[11]  E. Fanning,et al.  Quantitative procedures for use with the Edman-Begg sequenator. Partial sequences of two unusual immunoglobulin light chains, Rzf and Sac. , 1971, Biochemistry.

[12]  G. Hutchinson,et al.  Evaluation of polymer sequence fragment data using graph theory. , 1969, The Bulletin of mathematical biophysics.

[13]  J. Jungck,et al.  Group graph of the genetic code. , 1979, The Journal of heredity.

[14]  S. Wrobel,et al.  Thomas Hunt Morgan. Pioneer of genetics , 1976, Medical History.

[15]  S. Benzer The fine structure of the gene. , 1962, Scientific American.

[16]  J. Moon,et al.  On cliques in graphs , 1965 .

[17]  C R Merril,et al.  Reconstruction of protein and nucleic acid sequences: alamine transfer ribonucleic acid. , 1965, Science.

[18]  T. Gallai Transitiv orientierbare Graphen , 1967 .

[19]  S. Benzer,et al.  ON THE TOPOGRAPHY OF THE GENETIC FINE STRUCTURE. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Amir Pnueli,et al.  Permutation Graphs and Transitive Graphs , 1972, JACM.

[21]  M. Golumbic Algorithmic graph theory and perfect graphs , 1980 .

[22]  J. Ian Munro,et al.  Efficient Determination of the Transitive Closure of a Directed Graph , 1971, Inf. Process. Lett..

[23]  T. Gingeras,et al.  Steps toward computer analysis of nucleotide sequences. , 1980, Science.

[24]  P. Erdös On cliques in graphs , 1966 .

[25]  L. R. Croft Introduction to protein sequence analysis , 1980 .

[26]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[27]  E. Kay,et al.  Graph Theory. An Algorithmic Approach , 1975 .

[28]  F. Roberts Discrete Mathematical Models with Applications to Social, Biological, and Environmental Problems. , 1976 .

[29]  Polozov Rv,et al.  Determination of the primary structure of linear heteropolymers , 1972 .

[30]  J. E. Cohen,et al.  Food webs and niche space. , 1979, Monographs in population biology.

[31]  J. Jungck,et al.  Pre-Darwinian and non-Darwinian evolution of proteins. , 1971, Currents in modern biology.

[32]  S D Daubert,et al.  Computer simulation of the determination of amino acid sequences in polypeptides. , 1977, Journal of chemical education.

[33]  M O Dayhoff Computer aids to protein sequence determination. , 1965, Journal of theoretical biology.

[34]  R. Polozov,et al.  On the algorithms for determining the primary structure of biopolymers. , 1979, Bulletin of mathematical biology.

[35]  Margaret B. Cozzens,et al.  Higher and multi-dimensional analogues of interval graphs , 1981 .

[36]  Sidney W. Fox,et al.  Terminal Amino Acids in Peptides and Proteins , 1945 .

[37]  Jack Minker,et al.  An Analysis of Some Graph Theoretical Cluster Techniques , 1970, JACM.

[38]  A. Lempel,et al.  Transitive Orientation of Graphs and Identification of Permutation Graphs , 1971, Canadian Journal of Mathematics.

[39]  M. Lal Directed Hamilton Circuits , 1967, IEEE Transactions on Circuit Theory.

[40]  Mark Stefik,et al.  Inferring DNA Structures from Segmentation Data , 1978, Artif. Intell..

[41]  V. V. Shkurba Mathematical processing of a class of biochemical experiments , 1965 .

[42]  R. Staden A strategy of DNA sequencing employing computer programs. , 1979, Nucleic acids research.

[43]  J. F. Foster,et al.  Introduction to protein chemistry , 1957 .

[44]  Peter J. Cameron,et al.  6-Transitive graphs , 1980, J. Comb. Theory, Ser. B.

[45]  L. Festinger The Analysis of Sociograms using Matrix Algebra , 1949 .

[46]  P. Gilmore,et al.  A Characterization of Comparability Graphs and of Interval Graphs , 1964, Canadian Journal of Mathematics.

[47]  Mary B. Williams Needs for the Future: Radically Different Types of Mathematical Models , 1977 .

[48]  J. Lederberg Topology of Molecules , 1969 .

[49]  R. Duggleby,et al.  A computer program for determining the size of DNA restriction fragments. , 1981, Analytical biochemistry.