Exploring the relationship between fractal features and bacterial essential genes

Essential genes are indispensable for the survival of an organism in optimal conditions. Rapid and accurate identifications of new essential genes are of great theoretical and practical significance. Exploring features with predictive power is fundamental for this. Here, we calculate six fractal features from primary gene and protein sequences and then explore their relationship with gene essentiality by statistical analysis and machine learning-based methods. The models are applied to all the currently available identified genes in 27 bacteria from the database of essential genes (DEG). It is found that the fractal features of essential genes generally differ from those of non-essential genes. The fractal features are used to ascertain the parameters of two machine learning classifiers: Naive Bayes and Random Forest. The area under the curve (AUC) of both classifiers show that each fractal feature is satisfactorily discriminative between essential genes and non-essential genes individually. And, although significant correlations exist among fractal features, gene essentiality can also be reliably predicted by various combinations of them. Thus, the fractal features analyzed in our study can be used not only to construct a good essentiality classifier alone, but also to be significant contributors for computational tools identifying essential genes.

[1]  Laurence D. Hurst,et al.  Genomic function (communication arising): Rate of evolution and gene dispensability , 2003, Nature.

[2]  Jianxin Wang,et al.  Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. , 2014, Methods.

[3]  Zeba Wunderlich,et al.  Using the topology of metabolic networks to predict viability of mutant strains. , 2006, Biophysical journal.

[4]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[5]  J Craig Venter,et al.  A systems biology tour de force for a near-minimal bacterium , 2009, Molecular systems biology.

[6]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[7]  J. Ramos,et al.  Identification of conditionally essential genes for growth of Pseudomonas putida KT2440 on minimal medium through the screening of a genome-wide mutant library. , 2010, Environmental microbiology.

[8]  Thomas H Segall-Shapiro,et al.  Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome , 2010, Science.

[9]  J. A. Rodríguez-Velázquez,et al.  Subgraph centrality in complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Hsuan-Cheng Huang,et al.  Predicting essential genes based on network and sequence analysis. , 2009, Molecular bioSystems.

[11]  P. Sharp,et al.  Determinants of DNA sequence divergence betweenEscherichia coli andSalmonella typhimurium: Codon usage, map position, and concerted evolution , 1991, Journal of Molecular Evolution.

[12]  Qian Zhou,et al.  Comparative analysis of bacterial essential and nonessential genes with Hurst exponent based on chaos game representation , 2014 .

[13]  Hawoong Jeong,et al.  Prediction of Protein Essentiality Based on Genomic Data , 2002, Complexus.

[14]  Núria López-Bigas,et al.  Differences in the evolutionary history of disease genes affected by dominant or recessive mutations , 2006, BMC Genomics.

[15]  Ney Lemke,et al.  Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information , 2009, BMC Bioinformatics.

[16]  Michael R. Seringhaus,et al.  Predicting essential genes in fungal genomes. , 2006, Genome research.

[17]  Stephen C. J. Parker,et al.  Towards the identification of essential genes using targeted genome sequencing and comparative analysis , 2006, BMC Genomics.

[18]  Hon Wai Leong,et al.  Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology , 2010, BMC Bioinformatics.

[19]  Zu-Guo Yu,et al.  Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. , 2004, Journal of theoretical biology.

[20]  Eduardo P C Rocha,et al.  Essentiality, not expressiveness, drives gene-strand bias in bacteria , 2003, Nature Genetics.

[21]  Steffen Heber,et al.  In silico prediction of yeast deletion phenotypes. , 2006, Genetics and molecular research : GMR.

[22]  Milan Randić,et al.  Another look at the chaos-game representation of DNA , 2008 .

[23]  Feng Gao,et al.  Functionality of essential genes drives gene strand-bias in bacterial genomes. , 2010, Biochemical and biophysical research communications.

[24]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[25]  N. Rao,et al.  Predicting bacterial essential genes using only sequence composition information. , 2014, Genetics and molecular research : GMR.

[26]  Gábor Balázsi,et al.  Genome-scale identification of conditionally essential genes in E. coli by DNA microarrays. , 2004, Biochemical and biophysical research communications.

[27]  M. Gerstein,et al.  Genomic analysis of essentiality within protein networks. , 2004, Trends in genetics : TIG.

[28]  A. Clatworthy,et al.  Targeting virulence: a new paradigm for antimicrobial therapy , 2007, Nature Chemical Biology.

[29]  R. Kaul,et al.  A comprehensive transposon mutant library of Francisella novicida, a bioweapon surrogate , 2007, Proceedings of the National Academy of Sciences.

[30]  V. Anh,et al.  Fractals in DNA Sequence Analysis , 2002 .

[31]  Y. Dong,et al.  Systematic functional analysis of the Caenorhabditis elegans genome using RNAi , 2003, Nature.

[32]  Ali A. Minai,et al.  Investigating the predictability of essential genes across distantly related organisms using an integrative approach , 2010, Nucleic acids research.

[33]  Yi Pan,et al.  Predicting Essential Proteins Based on Weighted Degree Centrality , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.