Self-Organizing Map for Characterizing Heterogeneous Nucleotide and Amino Acid Sequence Motifs

A self-organizing map (SOM) is an artificial neural network algorithm that can learn from the training data consisting of objects expressed as vectors and perform non-hierarchical clustering to represent input vectors into discretized clusters, with vectors assigned to the same cluster sharing similar numeric or alphanumeric features. SOM has been used widely in transcriptomics to identify co-expressed genes as candidates for co-regulated genes. I envision SOM to have great potential in characterizing heterogeneous sequence motifs, and aim to illustrate this potential by a parallel presentation of SOM with a set of numerical vectors and a set of equal-length sequence motifs. While there are numerous biological applications of SOM involving numerical vectors, few studies have used SOM for heterogeneous sequence motif characterization. This paper is intended to encourage (1) researchers to study SOM in this new domain and (2) computer programmers to develop user-friendly motif-characterization SOM tools for biologists.

[1]  Dianhui Wang,et al.  A Robust Elicitation Algorithm for Discovering DNA Motifs Using Fuzzy Self-Organizing Maps , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Samuel Kaski,et al.  Comparing Self-Organizing Maps , 1996, ICANN.

[3]  Notable clustering of transcription-factor-binding motifs in human pericentric regions and its biological significance , 2013, Chromosome Research.

[4]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[5]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[6]  X. Xia,et al.  Factors Affecting Splicing Strength of Yeast Genes , 2011, Comparative and functional genomics.

[7]  H. Kishino,et al.  Heterogeneity of tempo and mode of mitochondrial DNA evolution among mammalian orders. , 1989, Idengaku zasshi.

[8]  Madalina Olteanu,et al.  SOMbrero: An R Package for Numeric and Non-numeric Self-Organizing Maps , 2014, WSOM.

[9]  Panu Somervuo,et al.  How to make large self-organizing maps for nonvectorial data , 2002, Neural Networks.

[10]  Junbai Wang,et al.  Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study , 2002, BMC Bioinformatics.

[11]  Aaron Golden,et al.  Self-Organizing Maps of Position Weight Matrices for Motif Discovery in Biological Sequences , 2005, Artificial Intelligence Review.

[12]  P. Walter,et al.  tRNA Ligase Is Required for Regulated mRNA Splicing in the Unfolded Protein Response , 1996, Cell.

[13]  R. Lorenzo-Redondo,et al.  Realistic Three Dimensional Fitness Landscapes Generated by Self Organizing Maps for the Analysis of Experimental HIV-1 Evolution , 2014, PloS one.

[14]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[15]  X. Xia Information-theoretic indices and an approximate significance test for testing the molecular clock hypothesis with genetic distances. , 2009, Molecular phylogenetics and evolution.

[16]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[17]  R. Kaufman,et al.  Stress signaling from the lumen of the endoplasmic reticulum: coordination of gene transcriptional and translational controls. , 1999, Genes & development.

[18]  X. Xia DAMBE6: New Tools for Microbial Genomics, Phylogenetics, and Molecular Evolution , 2017, The Journal of heredity.

[19]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[20]  Xuhua Xia,et al.  The +4G Site in Kozak Consensus Is Not Related to the Efficiency of Translation Initiation , 2007, PloS one.

[21]  Aaron Golden,et al.  Transcription factor binding site identification using the self-organizing map , 2005, Bioinform..

[22]  A transcriptome map of cellular transformation by the fos oncogene , 2005, Molecular Cancer.

[23]  X. Xia PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences. , 2016, Molecular phylogenetics and evolution.

[24]  Xuhua Xia,et al.  Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction , 2012, Scientifica.

[25]  Dianhui Wang,et al.  SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model , 2011, BMC Bioinformatics.

[26]  Bauer,et al.  Phase diagrams of self-organizing maps. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[27]  X. Xia DAMBE5: A Comprehensive Software Package for Data Analysis in Molecular Biology and Evolution , 2013, Molecular biology and evolution.

[28]  Klaus Pawelzik,et al.  Quantifying the neighborhood preservation of self-organizing feature maps , 1992, IEEE Trans. Neural Networks.

[29]  Zheng Xie,et al.  AMADA: analysis of microarray data , 2001, Bioinform..

[30]  P. Walter,et al.  Mechanism of non‐spliceosomal mRNA splicing in the unfolded protein response pathway , 1999, The EMBO journal.

[31]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[32]  Xuhua Xia,et al.  Bioinformatics and the cell - modern computational approaches in genomics, proteomics and transcriptomics , 2007 .

[33]  Ralf Der,et al.  A Novel Approach to Measure the Topology Preservation of Feature Maps , 1994 .

[34]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[35]  Ji Zhang,et al.  Component plane presentation integrated self‐organizing map for microarray data analysis , 2003, FEBS letters.

[36]  M. Nei,et al.  Prospects for inferring very large phylogenies by using the neighbor-joining method. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Dianhui Wang,et al.  A Further Study on Mining DNA Motifs Using Fuzzy Self-Organizing Maps , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Thomas Villmann,et al.  Topology preservation in self-organizing feature maps: exact definition and measurement , 1997, IEEE Trans. Neural Networks.

[39]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[40]  M. Kozak Possible role of flanking nucleotides in recognition of the AUG initiator codon by eukaryotic ribosomes. , 1981, Nucleic acids research.

[41]  Peter Walter,et al.  The Transmembrane Kinase Ire1p Is a Site-Specific Endonuclease That Initiates mRNA Splicing in the Unfolded Protein Response , 1997, Cell.

[42]  Aaron Golden,et al.  Improved detection of DNA motifs using a self-organized clustering of familial binding profiles , 2005, ISMB.

[43]  X. Xia,et al.  Selection preserves Ubiquitin Specific Protease 4 alternative exon skipping in therian mammals , 2016, Scientific Reports.

[44]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[45]  Juan Julián Merelo Guervós,et al.  A novel representation of genomic sequences for taxonomic clustering and visualization by means of self-organizing maps , 2015, Bioinform..

[46]  Shigehiko Kanaya,et al.  Informatics for unveiling hidden genome signatures. , 2003, Genome research.

[47]  D. Covell,et al.  Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data. , 2003, Molecular cancer therapeutics.

[48]  Aaron Golden,et al.  Self-organizing neural networks to support the discovery of DNA-binding motifs , 2006, Neural Networks.

[49]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[50]  Xuhua Xia,et al.  On transition bias in mitochondrial genes of pocket gophers , 1996, Journal of Molecular Evolution.

[51]  Joost N. Kok,et al.  TreeSOM: Cluster analysis in the self-organizing map , 2006, Neural Networks.

[52]  Thomas Villmann,et al.  Topology Preservation in Self-Organizing Feature Maps: General Definition and Efficient Measurement , 1994, Fuzzy Days.

[53]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[54]  Barbara Hammer Challenges in Neural Computation , 2012, KI - Künstliche Intelligenz.

[55]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[56]  T. D. Schneider,et al.  Quantitative analysis of the relationship between nucleotide sequence and functional activity. , 1986, Nucleic acids research.