Genetic algorithm and optimized weight matrix application for peroxisome proliferator response elements recognition: Prerequisites of accuracy growth for wide genome research

Development of reliable transcription factor binding site (TFBS) recognition methods is an important step in the large-scale genome analysis. The most of currently applied methods to predict functional TFBSs are hampered by the high false-positive rates that occur when too few functionally characterised sequences are available and only sequence conservation within a site core is considered. We propose two methods to search for binding sites (BSs) of peroxisome proliferator-activated receptor (PPAR) (peroxisome proliferator response elements, PPREs). The first method is the optimized dinucleotide position weight matrix (PWM) model, the second method represented by SiteGA model that used genetic algorithm with a discriminant function of locally positioned dinucleotides to infer the most important positions and dinucleotides. We used in our analysis two PPRE datasets, consisting of 37 and 98 BSs, correspondingly. We showed that dataset extension improved the accuracy of SiteGA, but not PWM model. Finally we combined both models (PWM and SiteGA) to the dataset of annotated human promoters (EPD). We demonstrated that the larger dataset and the longer window length supported notable growth of accuracies for PWM and SiteGA models. Consequently, a combined PWM and SiteGA application may better restrict the number of potential targets in the EPD promoter dataset.

[1]  Danielle G. Lemay,et al.  Genome-wide identification of peroxisome proliferator response elements using integrated computational genomics s⃞ Published, JLR Papers in Press, April 3, 2006. , 2006, Journal of Lipid Research.

[2]  Uwe Ohler,et al.  Optimized mixed Markov models for motif identification , 2006, BMC Bioinformatics.

[3]  Victor G. Levitsky,et al.  Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions , 2007, BMC Bioinformatics.

[4]  David E. Goldberg,et al.  The parameter-less genetic algorithm in practice , 2004, Inf. Sci..

[5]  F. P. Roth,et al.  A non-parametric model for transcription factor binding sites. , 2003, Nucleic acids research.

[6]  W. Wahli,et al.  Transcriptional regulation of metabolism. , 2006, Physiological reviews.

[7]  J. Plutzky,et al.  Peroxisome proliferator-activated receptors as transcriptional nodal points and therapeutic targets. , 2007, Circulation.

[8]  F. Villarroya,et al.  Functional relationship between MyoD and peroxisome proliferator-activated receptor-dependent regulatory pathways in the control of the human uncoupling protein-3 gene transcription. , 2003, Molecular Endocrinology.

[9]  Andrey N. Naumochkin,et al.  Transcription Regulatory Regions Database (TRRD): its status in 2002 , 2002, Nucleic Acids Res..

[10]  E A Anan'ko,et al.  [Method SiteGA for the recognition of transcription factor binding sites]. , 2006, Biofizika.

[11]  G. Stormo,et al.  Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites , 2005, Nucleic acids research.

[12]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[13]  L. Csiba,et al.  Transcriptional Regulation of Human CYP27 Integrates Retinoid, Peroxisome Proliferator-Activated Receptor, and Liver X Receptor Signaling in Macrophages , 2004, Molecular and Cellular Biology.

[14]  Heidi R. Kast-Woelbern,et al.  Rosiglitazone Induction of Insig-1 in White Adipose Tissue Reveals a Novel Interplay of Peroxisome Proliferator-activated Receptor γ and Sterol Regulatory Element-binding Protein in the Regulation of Adipogenesis* , 2004, Journal of Biological Chemistry.

[15]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[16]  Eric F. Johnson,et al.  Novel Sequence Determinants in Peroxisome Proliferator Signaling (*) , 1995, The Journal of Biological Chemistry.

[17]  U. Edvardsson,et al.  Activation of Peroxisome Proliferator-activated Receptor α Increases the Expression and Activity of Microsomal Triglyceride Transfer Protein in the Liver* , 2005, Journal of Biological Chemistry.

[18]  Cláudio F. Lima,et al.  Adaptive Population Sizing Schemes in Genetic Algorithms , 2007, Parameter Setting in Evolutionary Algorithms.

[19]  Armin Shmilovici,et al.  Identification of transcription factor binding sites with variable-order Bayesian networks , 2005, Bioinform..

[20]  Nir Friedman,et al.  Modeling dependencies in protein-DNA binding sites , 2003, RECOMB '03.

[21]  Cláudio F. Lima,et al.  A review of adaptive population sizing schemes in genetic algorithms , 2005, GECCO '05.

[22]  Dawn Field,et al.  Quantitative prediction of NF-κB DNA– protein interactions , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Rolf Backofen,et al.  A multiple-feature framework for modelling and predicting transcription factor binding sites , 2005, Bioinform..

[24]  Xiaowen Zhang,et al.  Peroxisome Proliferator-Activated Receptor γ Controls Muc1 Transcription in Trophoblasts , 2004, Molecular and Cellular Biology.

[25]  G. Stormo,et al.  Additivity in protein-DNA interactions: how good an approximation is it? , 2002, Nucleic acids research.

[26]  Qing Zhou,et al.  Modeling within-motif dependence for transcription factor binding site predictions , 2004, Bioinform..

[27]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[28]  Philipp Bucher,et al.  EPD in its twentieth year: towards complete promoter coverage of selected model organisms , 2005, Nucleic Acids Res..