Comparative Study of Topological Indices of Macro/Supramolecular RNA Complex Networks

RNA function annotation is often based on alignment to a previously studied template. In contrast to the study of proteins, there are not many alignment-free methods to predict RNA functions if alignment fails. The use of topological indices (TIs) of RNA complex networks (CNs) to find quantitative structure-activity relationships (QSAR) may be an alternative to incorporate secondary structure or sequence-to-sequence similarity. Here, we introduce new QSAR-like techniques using RNA macromolecular CNs (mmCNs), where nodes are nucleotides, or RNA supramolecular CNs (smCNs), where nodes are RNA sequences. We studied a data set of 198 sequences including 18S-rRNAs (important phylogenetic molecular biomarkers). We constructed three types of RNA mmCNs: sequence-linear (SL), Cartesian-lattice (CL), and sequence-folding CNs (SF-CNs) and two smCNs: sequence-sequence disagreement CN (SSD) and sequence-sequence similarity (SSS-smCN). We reported the first comparative QSAR study with all these CIs and CNs, which includes: (i) spectral moments ( ( i )micro d ( w)) of SL-mmCNs (accuracy = 75.3%), (ii) electrostatic CIs (xi d ) of CL-mmCNs (>90%), (iii) thermodynamic parameters (Delta G, Delta H, Delta S, and T m) of SF-mmCNs (64.7%), (iv) disagreement-distribution moments ( M k ) of the SSD-smCN (79.3%), and (v) node centralities of the SSD-smCN (78.0%). Furthermore, we reported the experimental isolation of a new RNA sequence from Psidum guajava leaf tissue and its QSAR and BLAST prediction to illustrate the practical use of these methods. We also investigated the use of these CNs to explore rRNA diversity on bacteria, plants, and parasites from the Dactylogyrus genus. The HPL-mmCNs model was the best of all found. All the CNs and TIs, except SF-mmCNs, were introduced here by the first time for the QSAR study of RNA, which allowed a comparative study for RNA classification.

[1]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[2]  Francisco Torrens,et al.  Prediction of Tyrosinase Inhibition Activity Using Atom‐Based Bilinear Indices , 2007, ChemMedChem.

[3]  K. Sugane,et al.  The nucleotide sequence and predicted secondary structure of small subunit (18S) ribosomal RNA from Spirometra erinaceieuropaei. , 1997, Gene.

[4]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[5]  Danail Bonchev,et al.  The Overall Wiener Index-A New Tool for Characterization of Molecular Topology , 2001, J. Chem. Inf. Comput. Sci..

[6]  Bruce R. Kowalski,et al.  31 Pattern recognition in chemistry , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[7]  Eugenio Uriarte,et al.  Markovian Backbone Negentropies: Molecular descriptors for protein research. I. Predicting protein stability in Arc repressor mutants , 2004, Proteins.

[8]  Milan Randic,et al.  On A Four-Dimensional Representation of DNA Primary Sequences , 2003, J. Chem. Inf. Comput. Sci..

[9]  Humberto González-Díaz,et al.  Proteins Markovian 3D-QSAR with spherically-truncated average electrostatic potentials. , 2005, Bioorganic & medicinal chemistry.

[10]  Humberto González-Díaz,et al.  Proteins QSAR with Markov average electrostatic potentials. , 2005, Bioorganic & medicinal chemistry letters.

[11]  Joachim Selbig,et al.  PaVESy: Pathway Visualization and Editing System , 2004, Bioinform..

[12]  David H. Mathews,et al.  Predicting a set of minimal free energy RNA secondary structures common to two sequences , 2005, Bioinform..

[13]  Humberto González-Díaz,et al.  Recognition of stable protein mutants with 3D stochastic average electrostatic potentials , 2005, FEBS letters.

[14]  Arno Lukas,et al.  Characterization of protein-interaction networks in tumors , 2007, BMC Bioinformatics.

[15]  Lourdes Santana,et al.  Medicinal chemistry and bioinformatics--current trends in drugs discovery with networks topological indices. , 2007, Current topics in medicinal chemistry.

[16]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[17]  L. Adamowicz,et al.  Prediction of Gas Chromatographic Retention Indices of Benzene Dicarboxylic Diesters Using Novel Topological Indices , 2006, Bulletin of environmental contamination and toxicology.

[18]  Yovani Marrero Ponce,et al.  Linear indices of the 'macromolecular graph's nucleotides adjacency matrix' as a promising approach for bioinformatics studies. Part 1: prediction of paromomycin's affinity constant with HIV-1 psi-RNA packaging region. , 2005, Bioorganic & medicinal chemistry.

[19]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[20]  Nico P E Vermeulen,et al.  Selection of antisense oligodeoxynucleotides against glutathione S-transferase Mu. , 2002, RNA.

[21]  Milan Randić,et al.  Graphical representations of DNA as 2-D map , 2004 .

[22]  Milan Randic,et al.  On the Similarity of DNA Primary Sequences , 2000, J. Chem. Inf. Comput. Sci..

[23]  M. Redinbaugh,et al.  A procedure for the small-scale isolation of plant RNA suitable for RNA blot analysis. , 1988, Analytical biochemistry.

[24]  Danail Bonchev,et al.  Overall Connectivities/Topological Complexities: A New Powerful Tool for QSPR/QSAR , 2000, J. Chem. Inf. Comput. Sci..

[25]  Michael Zuker,et al.  DINAMelt web server for nucleic acid melting prediction , 2005, Nucleic Acids Res..

[26]  Ernesto Estrada Virtual identification of essential proteins within the protein interaction network of yeast , 2005, Proteomics.

[27]  Lourdes Santana,et al.  On the applicability of QSAR for recognition of miRNA bioorganic structures at early stages of organism and cell development: embryo and stem cells. , 2007, Bioorganic & medicinal chemistry.

[28]  P. Schattner Searching for RNA genes using base-composition statistics. , 2002, Nucleic acids research.

[29]  Humberto González Díaz,et al.  2D‐RNA‐coupling numbers: A new computational chemistry approach to link secondary structure topology with biological function , 2007, J. Comput. Chem..

[30]  S. Horvath,et al.  Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks , 2006, BMC Genomics.

[31]  E. Daza,et al.  tRNA structure from a graph and quantum theoretical perspective. , 2006, Journal of theoretical biology.

[32]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[33]  M. Vignali,et al.  A protein interaction network of the malaria parasite Plasmodium falciparum , 2005, Nature.

[34]  C I Bermúdez,et al.  Characterization and comparison of Escherichia coli transfer RNAs by graph theory based on secondary structure. , 1999, Journal of theoretical biology.

[35]  S. Morand,et al.  SPECIFICITY AND SPECIALIZATION OF CONGENERIC MONOGENEANS PARASITIZING CYPRINID FISH , 2006 .

[36]  Humberto González Díaz,et al.  Markovian negentropies in bioinformatics. 1. A picture of footprints after the interaction of the HIV-1 -RNA packaging region with drugs , 2003, Bioinform..

[37]  Gregory A. Buck,et al.  From Molecular to Biological Structure and Back. , 2007 .

[38]  S. Morand,et al.  MOLECULAR PHYLOGENY OF CONGENERIC MONOGENEAN PARASITES (DACTYLOGYRUS): A CASE OF INTRAHOST SPECIATION , 2004, Evolution; international journal of organic evolution.

[39]  D. Bonchev,et al.  Overall connectivity--a next generation molecular connectivity. , 2001, Journal of molecular graphics & modelling.

[40]  Axel Meyer,et al.  Limitations of Metazoan 18S rRNA Sequence Data: Implications for Reconstructing a Phylogeny of the Animal Kingdom and Inferring the Reality of the Cambrian Explosion , 1998, Journal of Molecular Evolution.

[41]  Francisco Torrens,et al.  Protein linear indices of the 'macromolecular pseudograph alpha-carbon atom adjacency matrix' in bioinformatics. Part 1: prediction of protein stability effects of a complete set of alanine substitutions in Arc repressor. , 2005, Bioorganic & medicinal chemistry.

[42]  Francisco Torrens,et al.  TOMOCOMD-CARDD descriptors-based virtual screening of tyrosinase inhibitors: evaluation of different classification model combinations using bond-based linear indices. , 2007, Bioorganic & medicinal chemistry.

[43]  Francisco Torrens,et al.  Dragon method for finding novel tyrosinase inhibitors: Biosilico identification and experimental in vitro assays. , 2007, European journal of medicinal chemistry.

[44]  Humberto González-Díaz,et al.  Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L. , 2006, FEBS letters.

[45]  Humberto González-Díaz,et al.  Predicting stability of Arc repressor mutants with protein stochastic moments. , 2005, Bioorganic & medicinal chemistry.

[46]  Subhash C. Basak,et al.  Simple Numerical Descriptor for Quantifying Effect of Toxic Substances on DNA Sequences , 2000, J. Chem. Inf. Comput. Sci..

[47]  Falk Schreiber,et al.  Exploration of biological network centralities with CentiBiN , 2006, BMC Bioinformatics.

[48]  Juan Cui,et al.  Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity , 2006, Proteomics.

[49]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..