An estimator for local analysis of genome based on the minimal absent word.

This study presents an alternative alignment-free relative feature analysis method based on the minimal absent word, which has potential advantages over the local alignment method in local analysis. Smooth-local-analysis-curve and similarity-distribution are constructed for a fast, efficient, and visual comparison. Moreover, when the multi-sequence-comparison is needed, the local-analysis-curves can illustrate some interesting zones.

[1]  Antonio Restivo,et al.  An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression , 2005, CPM.

[2]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[3]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[4]  Hong Luo,et al.  CVTree: a phylogenetic tree reconstruction tool based on whole genomes , 2004, Nucleic Acids Res..

[5]  Dongmei Ai,et al.  Efficient statistical significance approximation for local similarity analysis of high-throughput time series data , 2013, Bioinform..

[6]  Xinguo Lu,et al.  A novel graphical representation of protein sequences and its application , 2011, J. Comput. Chem..

[7]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[8]  Changchuan Yin,et al.  A new method to cluster DNA sequences using Fourier power spectrum , 2015, Journal of Theoretical Biology.

[9]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[10]  K. Chou,et al.  Recent Progress in Predicting Posttranslational Modification Sites in Proteins. , 2015, Current topics in medicinal chemistry.

[11]  Brian T. Foley,et al.  HIV-1 Subtype and Circulating Recombinant Form (CRF) Reference Sequences, 2005 , 2005 .

[12]  Sukanta Mondal,et al.  Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction. , 2014, Journal of theoretical biology.

[13]  Antonio Restivo,et al.  Distance measures for biological sequences: Some recent approaches , 2008, Int. J. Approx. Reason..

[14]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[15]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[16]  Xiangde Zhang,et al.  Alignment free comparison: k word voting model and its applications. , 2013, Journal of theoretical biology.

[17]  Xiangde Zhang,et al.  The Burrows-Wheeler similarity distribution between biological sequences based on Burrows-Wheeler transform. , 2010, Journal of theoretical biology.

[18]  Jacques Lapointe,et al.  Theoretical and experimental biology in one—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers , 2013 .

[19]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[20]  Andrew D. Smith,et al.  A Geometric Interpretation for Local Alignment-Free Sequence Comparison , 2013, J. Comput. Biol..

[21]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[22]  Burkhard Morgenstern,et al.  Fast alignment-free sequence comparison using spaced-word frequencies , 2014, Bioinform..

[23]  Manish Kumar,et al.  Prediction of β-lactamase and its class by Chou's pseudo-amino acid composition and support vector machine. , 2015, Journal of theoretical biology.

[24]  Zaheer Ullah Khan,et al.  Discrimination of acidic and alkaline enzyme using Chou's pseudo amino acid composition in conjunction with probabilistic neural network model. , 2015, Journal of theoretical biology.

[25]  David Burstein,et al.  The Average Common Substring Approach to Phylogenomic Reconstruction , 2006, J. Comput. Biol..

[26]  Yan Li,et al.  Comparison study on statistical features of predicted secondary structures for protein structural class prediction: From content to position , 2013, BMC bioinformatics.

[27]  Kuo-Chen Chou,et al.  Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition , 2016, Journal of biomolecular structure & dynamics.

[28]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[29]  Ling Li,et al.  Similarity/dissimilarity studies of protein sequences based on a new 2D graphical representation , 2010, J. Comput. Chem..

[30]  Matteo Comin,et al.  Alignment-free phylogeny of whole genomes using underlying subwords , 2012, Algorithms for Molecular Biology.

[31]  Randy Goebel,et al.  Nucleotide composition string selection in HIV-1 subtyping using whole genomes , 2007, Bioinform..

[32]  Wei Chen,et al.  PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions , 2015, Bioinform..

[33]  Wei Chen,et al.  iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[34]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[35]  B. Liu,et al.  Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. , 2015, Journal of theoretical biology.

[36]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[37]  Changchuan Yin,et al.  An improved model for whole genome phylogenetic analysis by Fourier transform. , 2015, Journal of theoretical biology.

[38]  Xiangde Zhang,et al.  Large Local Analysis of the Unaligned Genome and Its Application , 2013, J. Comput. Biol..

[39]  Tuan D. Pham,et al.  A probabilistic measure for alignment-free sequence comparison , 2004, Bioinform..

[40]  J. Welsh,et al.  Molecular classification of human carcinomas by use of gene expression signatures. , 2001, Cancer research.

[41]  W. Zhong,et al.  Molecular Science for Drug Development and Biomedicine , 2014, International journal of molecular sciences.

[42]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[43]  De-Shuang Huang,et al.  Novel graphical representation of genome sequence and its applications in similarity analysis , 2012 .

[44]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[45]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[46]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[47]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[48]  Thomas Wiehe,et al.  Estimating Mutation Distances from Unaligned Genomes , 2009, J. Comput. Biol..

[49]  C. Zhang,et al.  Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. , 2000, Nucleic acids research.

[50]  Khalid Sayood,et al.  A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[51]  Benny Chor,et al.  Detecting Phylogenetic Signals in Eukaryotic Whole Genome Sequences , 2012, J. Comput. Biol..

[52]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[53]  K. Chou,et al.  iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. , 2015, Analytical biochemistry.

[54]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[55]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[56]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[57]  Alexandru T Balaban,et al.  Graphical representation of proteins. , 2011, Chemical reviews.

[58]  Qiuwen Zhang,et al.  MultiP-SChlo: Multi-label protein subchloroplast localization prediction , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[59]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[60]  Ying Wang,et al.  Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies , 2014, PloS one.

[61]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[62]  Xiaolong Wang,et al.  repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects , 2015, Bioinform..

[63]  Ping-an He,et al.  A novel descriptor of protein sequences and its application. , 2014, Journal of theoretical biology.