论文信息 - An estimator for local analysis of genome based on the minimal absent word. - 字舞流文

An estimator for local analysis of genome based on the minimal absent word.

This study presents an alternative alignment-free relative feature analysis method based on the minimal absent word, which has potential advantages over the local alignment method in local analysis. Smooth-local-analysis-curve and similarity-distribution are constructed for a fast, efficient, and visual comparison. Moreover, when the multi-sequence-comparison is needed, the local-analysis-curves can illustrate some interesting zones.

Chenhui Yang | Xiangde Zhang | Lianping Yang | Haoyue Fu | Xiangde Zhang | Hao-yue Fu | Lianping Yang | Chenhui Yang

[1] Antonio Restivo,et al. An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression , 2005, CPM.

[2] K. Chou. Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[3] Junjie Chen,et al. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[4] Hong Luo,et al. CVTree: a phylogenetic tree reconstruction tool based on whole genomes , 2004, Nucleic Acids Res..

[5] Dongmei Ai,et al. Efficient statistical significance approximation for local similarity analysis of high-throughput time series data , 2013, Bioinform..

[6] Xinguo Lu,et al. A novel graphical representation of protein sequences and its application , 2011, J. Comput. Chem..

[7] Xin Wang,et al. PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[8] Changchuan Yin,et al. A new method to cluster DNA sequences using Fourier power spectrum , 2015, Journal of Theoretical Biology.

[9] K. Chou. Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[10] K. Chou,et al. Recent Progress in Predicting Posttranslational Modification Sites in Proteins. , 2015, Current topics in medicinal chemistry.

[11] Brian T. Foley,et al. HIV-1 Subtype and Circulating Recombinant Form (CRF) Reference Sequences, 2005 , 2005 .

[12] Sukanta Mondal,et al. Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction. , 2014, Journal of theoretical biology.

[13] Antonio Restivo,et al. Distance measures for biological sequences: Some recent approaches , 2008, Int. J. Approx. Reason..

[14] Ren Long,et al. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[15] Kuo-Chen Chou,et al. Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[16] Xiangde Zhang,et al. Alignment free comparison: k word voting model and its applications. , 2013, Journal of theoretical biology.

[17] Xiangde Zhang,et al. The Burrows-Wheeler similarity distribution between biological sequences based on Burrows-Wheeler transform. , 2010, Journal of theoretical biology.

[18] Jacques Lapointe,et al. Theoretical and experimental biology in one—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers , 2013 .

[19] Wei Chen,et al. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[20] Andrew D. Smith,et al. A Geometric Interpretation for Local Alignment-Free Sequence Comparison , 2013, J. Comput. Biol..

[21] K. Chou,et al. PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[22] Burkhard Morgenstern,et al. Fast alignment-free sequence comparison using spaced-word frequencies , 2014, Bioinform..

[23] Manish Kumar,et al. Prediction of β-lactamase and its class by Chou's pseudo-amino acid composition and support vector machine. , 2015, Journal of theoretical biology.

[24] Zaheer Ullah Khan,et al. Discrimination of acidic and alkaline enzyme using Chou's pseudo amino acid composition in conjunction with probabilistic neural network model. , 2015, Journal of theoretical biology.

[25] David Burstein,et al. The Average Common Substring Approach to Phylogenomic Reconstruction , 2006, J. Comput. Biol..

[26] Yan Li,et al. Comparison study on statistical features of predicted secondary structures for protein structural class prediction: From content to position , 2013, BMC bioinformatics.

[27] Kuo-Chen Chou,et al. Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition , 2016, Journal of biomolecular structure & dynamics.

[28] K. Chou. Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[29] Ling Li,et al. Similarity/dissimilarity studies of protein sequences based on a new 2D graphical representation , 2010, J. Comput. Chem..

[30] Matteo Comin,et al. Alignment-free phylogeny of whole genomes using underlying subwords , 2012, Algorithms for Molecular Biology.

[31] Randy Goebel,et al. Nucleotide composition string selection in HIV-1 subtyping using whole genomes , 2007, Bioinform..

[32] Wei Chen,et al. PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions , 2015, Bioinform..

[33] Wei Chen,et al. iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[34] K. Chou,et al. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[35] B. Liu,et al. Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. , 2015, Journal of theoretical biology.

[36] K. Chou,et al. iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[37] Changchuan Yin,et al. An improved model for whole genome phylogenetic analysis by Fourier transform. , 2015, Journal of theoretical biology.

[38] Xiangde Zhang,et al. Large Local Analysis of the Unaligned Genome and Its Application , 2013, J. Comput. Biol..

[39] Tuan D. Pham,et al. A probabilistic measure for alignment-free sequence comparison , 2004, Bioinform..

[40] J. Welsh,et al. Molecular classification of human carcinomas by use of gene expression signatures. , 2001, Cancer research.

[41] W. Zhong,et al. Molecular Science for Drug Development and Biomedicine , 2014, International journal of molecular sciences.

[42] Kuo-Chen Chou,et al. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[43] De-Shuang Huang,et al. Novel graphical representation of genome sequence and its applications in similarity analysis , 2012 .

[44] Kuo-Chen Chou,et al. iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[45] Dong-Sheng Cao,et al. propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[46] Jonas S. Almeida,et al. Alignment-free sequence comparison-a review , 2003, Bioinform..

[47] Pufeng Du,et al. PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[48] Thomas Wiehe,et al. Estimating Mutation Distances from Unaligned Genomes , 2009, J. Comput. Biol..

[49] C. Zhang,et al. Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. , 2000, Nucleic acids research.

[50] Khalid Sayood,et al. A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[51] Benny Chor,et al. Detecting Phylogenetic Signals in Eukaryotic Whole Genome Sequences , 2012, J. Comput. Biol..

[52] K. Chou,et al. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[53] K. Chou,et al. iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. , 2015, Analytical biochemistry.

[54] Wei Chen,et al. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[55] James G. Lyons,et al. Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[56] Xin Chen,et al. An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[57] Alexandru T Balaban,et al. Graphical representation of proteins. , 2011, Chemical reviews.

[58] Qiuwen Zhang,et al. MultiP-SChlo: Multi-label protein subchloroplast localization prediction , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[59] K. Chou,et al. iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[60] Ying Wang,et al. Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies , 2014, PloS one.

[61] Wei Chen,et al. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[62] Xiaolong Wang,et al. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects , 2015, Bioinform..

[63] Ping-an He,et al. A novel descriptor of protein sequences and its application. , 2014, Journal of theoretical biology.