On representation of proteins by star-like graphs.

To arrive at graphical representations of proteins one is confronted with number of arbitrary decisions how to assign the 20 natural amino acids to equivalent or non-equivalent sites of underlying geometrical objects used for construction of their graphical representation. Here we consider representation of proteins based on generalized star graphs, which are graphs with one vertex of maximal degree in the center to which are attached other vertices of either degree one or two. The matrix representation of proteins based on star-like graphs has an important advantage in that, while its pictorial representation depends on selected assignment of amino acids to various branches of star graph, its properties do not depend on the adopted assignment of vertices to amino acids. Hence, the derived graph invariants, devoid of artifacts associated with graphical representations of biosequences, will better reflect upon the inherent properties of protein structure. We describe several graph invariants, mostly extracted from distance matrices of star-like graphs, which can serve as protein descriptors. The approach is illustrated on strand A of the human insulin.

[1]  Renfa Li,et al.  RNA secondary structure 2D graphical representation without degeneracy , 2006 .

[2]  Milan Randic,et al.  On A Four-Dimensional Representation of DNA Primary Sequences , 2003, J. Chem. Inf. Comput. Sci..

[3]  Gérard Grassy,et al.  Computer-assisted rational design of immunosuppressive compounds , 1998, Nature Biotechnology.

[4]  Ronald C. Read,et al.  Graph theory and computing , 1972 .

[5]  Humberto González-Díaz,et al.  Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L. , 2006, FEBS letters.

[6]  Alexandru T. Balaban,et al.  Unique graphical representation of protein sequences based on nucleotide triplet codons , 2004 .

[7]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[8]  Milan Randic,et al.  Use of path matrices for a characterization of molecular structures , 1998, Discrete Mathematical Chemistry.

[9]  Subhash C. Basak,et al.  On characterization of physical properties of amino acids , 2000 .

[10]  Milan Randić,et al.  High quality structure–property regressions. Boiling points of smaller alkanes , 2000 .

[11]  Milan Randic,et al.  Algorithm for Coding DNA Sequences into "Spectrum-like" and "Zigzag" Representations , 2005, J. Chem. Inf. Model..

[12]  M. Gates A simple way to look at DNA. , 1986, Journal of theoretical biology.

[13]  H. Hosoya Topological Index. A Newly Proposed Quantity Characterizing the Topological Nature of Structural Isomers of Saturated Hydrocarbons , 1971 .

[14]  M. Randic Novel graph theoretical approach to heteroatoms in quantitative structure—activity relationships , 1991 .

[15]  M. Randic,et al.  On the Characterization of DNA Primary Sequences by Triplet of Nucleic Acid Bases , 2001, J. Chem. Inf. Comput. Sci..

[16]  M. Randic,et al.  Bond profiles for cuboctahedron and twist cuboctahedron , 1996 .

[17]  Milan Randic,et al.  Variable Connectivity Index for Cycle-Containing Structures , 2001, J. Chem. Inf. Comput. Sci..

[18]  Milan Randić Spectrum-like graphical representation of DNA based on codons , 2006 .

[19]  Milan Randic,et al.  Distance/Distance Matrixes , 1994, J. Chem. Inf. Comput. Sci..

[20]  P. M. Leong,et al.  Random walk and gap plots of DNA sequences , 1995, Comput. Appl. Biosci..

[21]  Marjan Vracko,et al.  Compact 2-D graphical representation of DNA , 2003 .

[22]  Milan Randic,et al.  On Use of the Variable Connectivity Index 1f in QSAR: Toxicity of Aliphatic Ethers , 2001, J. Chem. Inf. Comput. Sci..

[23]  S. F. Boys Electronic wave functions - I. A general method of calculation for the stationary states of any molecular system , 1950, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[24]  M. Randic,et al.  MOLECULAR PROFILES NOVEL GEOMETRY-DEPENDENT MOLECULAR DESCRIPTORS , 1995 .

[25]  Milan Randic Condensed Representation of DNA Primary Sequences , 2000, J. Chem. Inf. Comput. Sci..

[26]  Milan Randić On a geometry-based approach to protein sequence alignment , 2008 .

[27]  Ante Graovac,et al.  Novel graphical and numerical representations of DNA , 2005 .

[28]  Goran Krilov,et al.  ON A CHARACTERIZATION OF THE FOLDING OF PROTEINS , 1999 .

[29]  Tomaž Pisanski,et al.  On representation of DNA by line distance matrix , 2008 .

[30]  M. Randic On computation of optimal parameters for multivariate analysis of structure‐property relationship , 1991 .

[31]  Milan Randić,et al.  On characterization of the conformations of nine‐membered rings , 1995 .

[32]  Dejan Plavšić,et al.  A novel unexpected use of a graphical representation of DNA : Graphical alignment of DNA sequences , 2006 .

[33]  Goran Krilov,et al.  Characterization of 3-D sequences of proteins☆ , 1997 .

[34]  A J M Carpy,et al.  Structural e-bioinformatics and drug design , 2006, SAR and QSAR in environmental research.

[35]  Jure Zupan,et al.  Novel 2-D graphical representation of proteins , 2006 .

[36]  Kenneth H. Rosen,et al.  Discrete Mathematics and its applications , 2000 .

[37]  L. Lovász,et al.  On the eigenvalues of trees , 1973 .

[38]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[39]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[40]  Dejan Plavšić,et al.  Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation , 2003 .

[41]  Milan Randic,et al.  A novel 2-D graphical representation of DNA sequences of low degeneracy , 2001 .

[42]  Renfa Li,et al.  Coronavirus phylogeny based on triplets of nucleic acids bases , 2006, Chemical Physics Letters.

[43]  Goran Krilov,et al.  On characterization of molecular surfaces , 1997 .

[44]  Frank Harary,et al.  Graph Theory , 2016 .

[45]  Tianming Wang,et al.  On Graphical and Numerical Representation of Protein Sequences , 2006, Journal of biomolecular structure & dynamics.

[46]  Milan Randic,et al.  Characterization of DNA Primary Sequences Based on the Average Distances between Bases , 2001, J. Chem. Inf. Comput. Sci..

[47]  Tomaz Pisanski,et al.  Foldedness in linear polymers: A difference between graphical and Euclidean distances , 1998, Discrete Mathematical Chemistry.

[48]  Milan Randic,et al.  The Variable Connectivity Index 1f versus the Traditional Molecular Descriptors: A Comparative Study of 1f Against Descriptors of CODESSA , 2001, J. Chem. Inf. Comput. Sci..

[49]  Milan Randic,et al.  Construction of High-Quality Structure-Property-Activity Regressions: The Boiling Points of Sulfides , 2000, J. Chem. Inf. Comput. Sci..

[50]  M. Randic,et al.  Highly compact 2D graphical representation of DNA sequences , 2004, SAR and QSAR in environmental research.

[51]  A. Nandy,et al.  A new graphical representation and analysis of DNA sequence structure. I: Methodology and application to globin genes , 1994 .

[52]  Alexandru T Balaban,et al.  Using variable and fixed topological indices for the prediction of reaction rate constants of volatile unsaturated hydrocarbons with OH radicals. , 2004, Molecules.

[53]  Milan Randic,et al.  On 3-D Graphical Representation of DNA Primary Sequences and Their Numerical Characterization , 2000, J. Chem. Inf. Comput. Sci..

[54]  L B Kier,et al.  Molecular connectivity V: connectivity series concept applied to density. , 1976, Journal of pharmaceutical sciences.

[55]  Milan Randić On structural ordering and branching of acyclic saturated hydrocarbons , 1998 .

[56]  J. Platt Prediction of Isomeric Differences in Paraffin Properties , 1952 .

[57]  M. Randic,et al.  Quantitative characterization of protein structure: application to a novel α/β fold , 2004 .

[58]  Milan Randić,et al.  On characterization of DNA primary sequences by a condensed matrix , 2000 .

[59]  Milan Randic,et al.  Molecular Shape Profiles , 1995, J. Chem. Inf. Comput. Sci..

[60]  Milan Randić,et al.  On characterization of three-dimensional structures† , 1988 .

[61]  Sidney I. Landau Funk & Wagnalls Standard desk dictionary , 1976 .

[62]  Chun Li,et al.  On a 3-D representation of DNA primary sequences. , 2004, Combinatorial chemistry & high throughput screening.

[63]  Subhash C. Basak,et al.  Prediction of gas chromatographic retention indices using variable connectivity index , 2001 .

[64]  Jan Cz. Dobrowolski,et al.  Optimal molecular connectivity descriptors for nitrogen-containing molecules , 1998 .

[65]  EUGENE HAMORI,et al.  Novel DNA sequence representations , 1985, Nature.

[66]  Milan Randic,et al.  On the Similarity of DNA Primary Sequences , 2000, J. Chem. Inf. Comput. Sci..

[67]  Dejan Plavšić,et al.  Novel 2-D graphical representation of DNA sequences and their numerical characterization , 2003 .

[68]  M Novic,et al.  Novel numerical and graphical representation of DNA sequences and proteins , 2006, SAR and QSAR in environmental research.

[69]  Milan Randić,et al.  Graphical representations of DNA as 2-D map , 2004 .

[70]  Milan Randić,et al.  2-D Graphical representation of proteins based on physico-chemical properties of amino acids , 2007 .

[71]  M. Randic Characterization of molecular branching , 1975 .

[72]  M. Randic,et al.  2-D Graphical representation of proteins based on virtual genetic code , 2004, SAR and QSAR in environmental research.