Using deformation energy to analyze nucleosome positioning in genomes.

By modulating the accessibility of genomic regions to regulatory proteins, nucleosome positioning plays important roles in cellular processes. Although intensive efforts have been made, the rules for determining nucleosome positioning are far from satisfaction yet. In this study, we developed a biophysical model to predict nucleosomal sequences based on the deformation energy of DNA sequences, and validated it against the experimentally determined nucleosome positions in the Saccharomyces cerevisiae genome, achieving very high success rates. Furthermore, using the deformation energy model, we analyzed the distribution of nucleosomes around the following three types of DNA functional sites: (1) double strand break (DSB), (2) single nucleotide polymorphism (SNP), and (3) origin of replication (ORI). We have found from the analyzed energy spectra that a remarkable "trough" or "valley" occurs around each of these functional sites, implying a depletion of nucleosome density, fully in accordance with experimental observations. These findings indicate that the deformation energy may play a key role for accurately predicting nucleosome positions, and that it can also provide a quantitative physical approach for in-depth understanding the mechanism of nucleosome positioning.

[1]  Xi Yang,et al.  Statistical analysis of conformational properties of periodic dinucleotide steps in nucleosomes , 2010 .

[2]  Xiaolong Wang,et al.  Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach , 2015, Journal of biomolecular structure & dynamics.

[3]  K. Chou,et al.  Predicting human immunodeficiency virus protease cleavage sites in proteins by a discriminant function method , 1996, Proteins.

[4]  B. Liu,et al.  Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. , 2015, Journal of theoretical biology.

[5]  P. Park,et al.  Impact of chromatin structure on sequence variability in the human genome , 2011, Nature Structural &Molecular Biology.

[6]  Grant W. Brown,et al.  Diversity of Eukaryotic DNA Replication Origins Revealed by Genome-Wide Analysis of Chromatin Structure , 2010, PLoS genetics.

[7]  J. Chou,et al.  Predicting cleavability of peptide sequences by HIV protease via correlation-angle approach , 1993, Journal of protein chemistry.

[8]  K. Chou,et al.  A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. , 1993, The Journal of biological chemistry.

[9]  Wei Chen,et al.  iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[10]  Xiang Cheng,et al.  iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach , 2015, Journal of biomolecular structure & dynamics.

[11]  Kuo-Chen Chou,et al.  iNR-Drug: Predicting the Interaction of Drugs with Nuclear Receptors in Cellular Networking , 2014, International journal of molecular sciences.

[12]  K. Chou,et al.  iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model , 2015, Journal of biomolecular structure & dynamics.

[13]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[14]  Modesto Orozco,et al.  DNAlive: a tool for the physical analysis of DNA at the genomic scale , 2008, Bioinform..

[15]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[16]  Hong Yan,et al.  Correlation between the flexibility and periodic dinucleotide patterns in yeast nucleosomal DNA sequences. , 2011, Journal of theoretical biology.

[17]  Xuan Zhu,et al.  A Hierarchical Combination of Factors Shapes the Genome-wide Topography of Yeast Meiotic Recombination Initiation , 2011, Cell.

[18]  K. Chou,et al.  Studies on the specificity of HIV protease: An application of Markov chain theory , 1993, Journal of protein chemistry.

[19]  K. Chou,et al.  iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. , 2011, Journal of theoretical biology.

[20]  V. Zhurkin,et al.  DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[21]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[22]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[23]  Alexandre V. Morozov,et al.  Using DNA mechanics to predict in vitro nucleosome positions and formation energies , 2009, Nucleic acids research.

[24]  Dustin E. Schones,et al.  Dynamic Regulation of Nucleosome Positioning in the Human Genome , 2008, Cell.

[25]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[26]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[27]  Andrew V. Colasanti,et al.  A novel roll-and-slide mechanism of DNA folding in chromatin: implications for nucleosome positioning. , 2007, Journal of molecular biology.

[28]  Kara Dolinski,et al.  Genome-Wide Analysis of Nucleotide-Level Variation in Commonly Used Saccharomyces cerevisiae Strains , 2007, PloS one.

[29]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[30]  B. Liu,et al.  Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach , 2015, PloS one.

[31]  Steven M. Johnson,et al.  A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. , 2008, Genome research.

[32]  Ronald W. Davis,et al.  A high-resolution atlas of nucleosome occupancy in yeast , 2007, Nature Genetics.

[33]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[34]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[35]  K. Chou,et al.  iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition , 2014, BioMed research international.

[36]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[37]  T. Richmond,et al.  Crystal structure of the nucleosome core particle at 2.8 Å resolution , 1997, Nature.

[38]  K. Chou,et al.  iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. , 2015, Analytical biochemistry.

[39]  Wei Chen,et al.  iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties , 2012, PloS one.

[40]  U. Bastolla,et al.  High‐resolution analysis of DNA synthesis start sites and nucleosome architecture at efficient mammalian replication origins , 2013, The EMBO journal.

[41]  K. Chou,et al.  iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach , 2014, BioMed research international.

[42]  Irene K. Moore,et al.  The DNA-encoded nucleosome organization of a eukaryotic genome , 2009, Nature.

[43]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[44]  C. Zhang,et al.  An alternate-subsite-coupled model for predicting HIV protease cleavage sites in proteins. , 1994, Protein engineering.

[45]  K. Chou,et al.  A vector projection approach to predicting HIV protease cleavage sites in proteins , 1993, Proteins.

[46]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[47]  Irene K. Moore,et al.  A genomic code for nucleosome positioning , 2006, Nature.

[48]  Kuo-Chen Chou,et al.  Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition , 2016, Journal of biomolecular structure & dynamics.

[49]  G. Ast,et al.  Chromatin organization marks exon-intron structure , 2009, Nature Structural &Molecular Biology.

[50]  K. Chou,et al.  iHyd-PseAAC: Predicting Hydroxyproline and Hydroxylysine in Proteins by Incorporating Dipeptide Position-Specific Propensity into Pseudo Amino Acid Composition , 2014, International journal of molecular sciences.

[51]  K. Nechvíle The High Resolution , 2005 .

[52]  K. Chou,et al.  iNitro-Tyr: Prediction of Nitrotyrosine Sites in Proteins with General Pseudo Amino Acid Composition , 2014, PloS one.

[53]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[54]  Craig J. Benham,et al.  OriDB: a DNA replication origin database , 2006, Nucleic Acids Res..

[55]  Wei Chen,et al.  DNA Physical Parameters Modulate Nucleosome Positioning in the Saccharomyces cerevisiae Genome , 2014 .

[56]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[57]  Wei Chen,et al.  The organization of nucleosomes around splice sites , 2010, Nucleic acids research.

[58]  I. Albert,et al.  Nucleosome positions predicted through comparative genomics , 2006, Nature Genetics.

[59]  William Stafford Noble,et al.  Nucleosome positioning signals in genomic DNA. , 2007, Genome research.

[60]  Xiaolong Wang,et al.  iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach , 2016, Journal of biomolecular structure & dynamics.

[61]  B. Liu,et al.  iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition , 2014, PloS one.

[62]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[63]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[64]  Vincent Miele,et al.  DNA physical properties determine nucleosome occupancy from yeast to fly , 2008, Nucleic acids research.

[65]  Yvan Saeys,et al.  Generic eukaryotic core promoter prediction using structural features of DNA. , 2008, Genome research.

[66]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[67]  T. Petes,et al.  Meiotic recombination hot spots and cold spots , 2001, Nature Reviews Genetics.

[68]  Masaru Tomita,et al.  Computational analysis suggests a highly bendable, fragile structure for nucleosomal DNA. , 2011, Gene.

[69]  Xiangyin Kong,et al.  The impact of nucleosome positioning on the organization of replication origins in eukaryotes. , 2009, Biochemical and biophysical research communications.

[70]  Shigenori Iwai,et al.  Nucleosomal structure of undamaged DNA regions suppresses the non-specific DNA binding of the XPC complex. , 2005, DNA repair.

[71]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[72]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[73]  William Stafford Noble,et al.  Predicting Human Nucleosome Occupancy from Primary Sequence , 2008, PLoS Comput. Biol..

[74]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[75]  S. Keeney Spo11 and the Formation of DNA Double-Strand Breaks in Meiosis. , 2008, Genome dynamics and stability.

[76]  Guo-Cheng Yuan,et al.  Genomic Sequence Is Highly Predictive of Local Nucleosome Depletion , 2007, PLoS Comput. Biol..

[77]  K. Chou,et al.  A sequence‐coupled vector‐projection model for predicting the specificity of GalNAc‐transferase , 1995, Protein science : a publication of the Protein Society.

[78]  K. Chou Prediction of human immunodeficiency virus protease cleavage sites in proteins. , 1996, Analytical biochemistry.

[79]  Xiao Sun,et al.  Analysis of nucleosome positioning determined by DNA helix curvature in the human genome , 2011, BMC Genomics.

[80]  K. Chou,et al.  iRSpot-TNCPseAAC: Identify Recombination Spots with Trinucleotide Composition and Pseudo Amino Acid Components , 2014, International journal of molecular sciences.