Mining Regulatory Elements in the Plasmodium falciparum Genome Using Gene Expression Data

is very little information available with regard to gene regulatory relationships in Plasmodium falciparum. In an attempt to discover transcription factor binding motifs (TFBMs) in P. falciparum, we considered two approaches. In the first approach, gene expression data of all the conditions were fed into the Iterative Signature Algorithm (ISA), which outputs modules composed of sets of genes associated with co-regulating conditions. Potential TFBMs were discovered by applying AlignACE on the resulting gene sets. In the second approach, MotifRegressor was used to generate motifs associated with induced and repressed genes for each time point and then clustered based on the strength of their correlation to the gene expression (i.e., motif coefficients) across different time points. Currently, a total of 637 and 840 motifs have been discovered by the MotifRegressor and ISA-AlignACE programs, respectively. All this information was uploaded into a database, thus making it easy to devise complex queries. Using published information on known motifs, we were able to validate some of our results. In addition, modules consisting of putative transcription factors and related genes were also investigated. This work provides a bioinformatics methodology to analyze transcription regulation and TFBMs across the whole genome. data generated by DeRisi's laboratory using transcripts from the organism Plasmodium falciparum, harvested at 46 different time points during its intraerythorcytic developmental life cycle (2). P. falciparum is one of four species of the parasitic protozoan genus Plasmodium, and is responsible for the vast majority of malaria episodes, affecting 200-300 million individuals and causing 0.7- 2.7 million deaths per year worldwide. In this paper, we focused on mining for information related to gene regulation and transcription factor binding motifs (TFBM), which is important considering the fact that direct experimental identification of TFBMs is slow and laborious. We used two recently developed algorithms to predict potential TFBMs:

[1]  M Lanzer,et al.  Control of gene expression in Plasmodium falciparum. , 1998, Molecular and biochemical parasitology.

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[4]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  A. Cowman,et al.  Genomic distribution and functional characterisation of two distinct and conserved Plasmodium falciparum var gene 5' flanking sequences. , 2000, Molecular and biochemical parasitology.

[7]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[8]  T. Triglia,et al.  A novel ligand from Plasmodium falciparum that binds to a sialic acid‐containing receptor on the surface of human erythrocytes , 2001, Molecular microbiology.

[9]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[10]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[11]  John Quackenbush,et al.  Genesis: cluster analysis of microarray data , 2002, Bioinform..

[12]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[13]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[14]  M. Kaestli,et al.  Identification of nuclear proteins that interact differentially with Plasmodium falciparum var gene promoters , 2003, Molecular microbiology.

[15]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Jacques van Helden,et al.  Regulatory Sequence Analysis Tools , 2003, Nucleic Acids Res..

[18]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[19]  D. Wirth,et al.  Identification of regulatory elements in the Plasmodium falciparum genome. , 2004, Molecular and biochemical parasitology.