Representation of proteins as walks in 20-D space

A novel representation of proteins was introduced. It is independent of arbitrary decisions with respect to the choice of labels to be assigned to the 20 natural amino acids. The approach is based on an assignment of 20 unit vectors in 20-dimensional vector space to the 20 natural amino acids. Proteins are then represented by a walk, that is, a sequence of steps in the 20-dimensional space analogous to a walk in the (x, y) plane in the case of binary strings. A straightforward numerical characterization of proteins is obtained from the distance matrix associated with the walk representing the protein in 20-dimensional space combining the information on the Euclidean distance between various amino acids in protein sequence. The Line Distance matrix offers additional numerical characterization of proteins, while the lengths of steps of the walk in 20-D space allow construction of a “protein profile,” which represents distribution of average lengths of the steps and their powers. †Visitor, Emeritus, Department of Mathematics and Computer Science, Drake University, Des Moines, IA, USA.

[1]  M. Randic,et al.  The connectivity index 25 years after. , 2001, Journal of molecular graphics & modelling.

[2]  M. Randic,et al.  On Description of Biological Sequences by Spectral Properties of Line Distance Matrices , 2007 .

[3]  Milan Randić,et al.  Graphical representations of DNA as 2-D map , 2004 .

[4]  S. Basak,et al.  Use of Graph-Theoretic and Geometrical Molecular Descriptors in Structure-Activity Relationships , 2002 .

[5]  Marjan Vracko,et al.  Compact 2-D graphical representation of DNA , 2003 .

[6]  Ernesto Estrada Novel Strategies in the Search of Topological Indices , 2000 .

[7]  Dejan Plavšić,et al.  Novel 2-D graphical representation of DNA sequences and their numerical characterization , 2003 .

[8]  Y. K. Huen Representation of Biological Sequences Using Point Geometry Analysis , 1995 .

[9]  Milan Randic,et al.  Molecular Shape Profiles , 1995, J. Chem. Inf. Comput. Sci..

[10]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[11]  Dejan Plavšić,et al.  Four-color map representation of DNA or RNA sequences and their numerical characterization , 2005 .

[12]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[13]  A. T. Balaban and O. Ivanciuc,et al.  Historical Development of Topological Indices , 2000 .

[14]  Alexandru T. Balaban,et al.  From chemical topology to three-dimensional geometry , 2002 .

[15]  M. Novič,et al.  On novel representation of proteins based on amino acid adjacency matrix , 2008, SAR and QSAR in environmental research.

[16]  Milan Randic,et al.  Molecular Topographic Indices , 1995, J. Chem. Inf. Comput. Sci..

[17]  G. Sobue,et al.  A rescue factor abolishing neuronal cell death by a wide spectrum of familial Alzheimer's disease genes and Aβ , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[19]  Alan Wee-Chung Liew,et al.  DB-Curve: a novel 2D method of DNA sequence visualization and representation , 2003 .

[20]  M. Randic,et al.  MOLECULAR PROFILES NOVEL GEOMETRY-DEPENDENT MOLECULAR DESCRIPTORS , 1995 .

[21]  A. Balaban,et al.  Topological Indices and Related Descriptors in QSAR and QSPR , 2003 .

[22]  E. Kawasaki,et al.  Renaturation and Purification of Biologically Active Recombinant Human Macrophage Colony-Stimulating Factor Expressed in E. Coli , 1989, Bio/Technology.

[23]  Eric Renshaw 1. Fractals Everywhere , 1995 .

[24]  Chun Li,et al.  On a 3-D representation of DNA primary sequences. , 2004, Combinatorial chemistry & high throughput screening.

[25]  M. Randic,et al.  Highly compact 2D graphical representation of DNA sequences , 2004, SAR and QSAR in environmental research.

[26]  A. Nandy,et al.  A new graphical representation and analysis of DNA sequence structure. I: Methodology and application to globin genes , 1994 .

[27]  Milan Randic,et al.  On A Four-Dimensional Representation of DNA Primary Sequences , 2003, J. Chem. Inf. Comput. Sci..

[28]  Milan Randic,et al.  A novel 2-D graphical representation of DNA sequences of low degeneracy , 2001 .

[29]  Eduardo Mizraji,et al.  Perceptible Features in Graphical Representations of Nucleic Acid Sequences , 1995 .

[30]  Milan Randic,et al.  Distance/Distance Matrixes , 1994, J. Chem. Inf. Comput. Sci..

[31]  P. M. Leong,et al.  Random walk and gap plots of DNA sequences , 1995, Comput. Appl. Biosci..

[32]  Tianming Wang,et al.  On Graphical and Numerical Representation of Protein Sequences , 2006, Journal of biomolecular structure & dynamics.

[33]  Jure Zupan,et al.  Novel 2-D graphical representation of proteins , 2006 .

[34]  Milan Randic,et al.  On Interpretation of Well-Known Topological Indices , 2001, J. Chem. Inf. Comput. Sci..

[35]  E Hamori Graphic representation of long DNA sequences by the method of H curves--current results and future aspects. , 1989, BioTechniques.

[36]  Jure Zupan,et al.  On representation of proteins by star-like graphs. , 2007, Journal of molecular graphics & modelling.

[37]  Tomaž Pisanski,et al.  On representation of DNA by line distance matrix , 2008 .