Prediction of DNase I Hypersensitive Sites by Using Pseudo Nucleotide Compositions

DNase I hypersensitive sites (DHS) associated with a wide variety of regulatory DNA elements. Knowledge about the locations of DHS is helpful for deciphering the function of noncoding genomic regions. With the acceleration of genome sequences in the postgenomic age, it is highly desired to develop cost-effective computational methods to identify DHS. In the present work, a support vector machine based model was proposed to identify DHS by using the pseudo dinucleotide composition. In the jackknife test, the proposed model obtained an accuracy of 83%, which is competitive with that of the existing method. This result suggests that the proposed model may become a useful tool for DHS identifications.

[1]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[4]  W. Marsden I and J , 2012 .

[5]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[6]  Zhanchao Li,et al.  Predicting promoters by pseudo-trinucleotide compositions based on discrete wavelets transform. , 2013, Journal of theoretical biology.

[7]  Pedro Madrigal,et al.  Current bioinformatic approaches to identify DNase I hypersensitive sites and genomic footprints from DNase-seq data , 2012, Front. Gene..

[8]  Hui Ding,et al.  Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. , 2011, Journal of theoretical biology.

[9]  Xiaolong Wang,et al.  Exploiting three kinds of interface propensities to identify protein binding sites , 2009, Comput. Biol. Chem..

[10]  K. Chou,et al.  iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition , 2014, BioMed research international.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  G. Felsenfeld,et al.  Chromatin as an essential part of the transcriptional mechanim , 1992, Nature.

[13]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[14]  D. S. Gross,et al.  Nuclease hypersensitive sites in chromatin. , 1988, Annual review of biochemistry.

[15]  Asifullah Khan,et al.  MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. , 2012, Journal of theoretical biology.

[16]  Wei Chen,et al.  Prediction of midbody, centrosome and kinetochore proteins based on gene ontology information. , 2010, Biochemical and biophysical research communications.

[17]  Wei Chen,et al.  Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. , 2012, Journal of proteomics.

[18]  M. Daly,et al.  Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). , 2005, Genome research.

[19]  Modesto Orozco,et al.  Determining promoter location based on DNA structure first-principles calculations , 2007, Genome Biology.

[20]  M. Groudine,et al.  Controlling the double helix , 2003, Nature.

[21]  Neil Genzlinger A. and Q , 2006 .

[22]  William Stafford Noble,et al.  Predicting the in vivo signature of human gene regulatory sequence , 2005, ISMB.

[23]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[24]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[25]  Kuo-Chen Chou,et al.  Predicting membrane protein types by the LLDA algorithm. , 2008, Protein and peptide letters.

[26]  Zong Dai,et al.  Predicting methylation status of human DNA sequences by pseudo-trinucleotide composition. , 2011, Talanta.

[27]  Wei Chen,et al.  Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine , 2012, Comput. Biol. Medicine.

[28]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[29]  B. Liu,et al.  Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection , 2012, PloS one.

[30]  Qian-zhong Li,et al.  Identification of TATA and TATA-less promoters in plant genomes by integrating diversity measure, GC-Skew and DNA geometric flexibility. , 2011, Genomics.

[31]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[32]  Sarah C. R. Elgin,et al.  The chromatin structure of specific genes: I. Evidence for higher order domains of defined DNA sequence , 1979, Cell.

[33]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[34]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[35]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[36]  Xiaolong Wang,et al.  Prediction of protein binding sites in protein structures using hidden Markov support vector machine , 2009, BMC Bioinformatics.