Significance analysis of lexical bias in microarray data

BackgroundGenes that are determined to be significantly differentially regulated in microarray analyses often appear to have functional commonalities, such as being components of the same biochemical pathway. This results in certain words being under- or overrepresented in the list of genes. Distinguishing between biologically meaningful trends and artifacts of annotation and analysis procedures is of the utmost importance, as only true biological trends are of interest for further experimentation. A number of sophisticated methods for identification of significant lexical trends are currently available, but these methods are generally too cumbersome for practical use by most microarray users.ResultsWe have developed a tool, LACK, for calculating the statistical significance of apparent lexical bias in microarray datasets. The frequency of a user-specified list of search terms in a list of genes which are differentially regulated is assessed for statistical significance by comparison to randomly generated datasets. The simplicity of the input files and user interface targets the average microarray user who wishes to have a statistical measure of apparent lexical trends in analyzed datasets without the need for bioinformatics skills. The software is available as Perl source or a Windows executable.ConclusionWe have used LACK in our laboratory to generate biological hypotheses based on our microarray data. We demonstrate the program's utility using an example in which we confirm significant upregulation of SPI-2 pathogenicity island of Salmonella enterica serovar Typhimurium by the cation chelator dipyridyl.

[1]  R. Altman,et al.  Whole-genome expression analysis: challenges beyond clustering. , 2001, Current opinion in structural biology.

[2]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[3]  A. Khodursky,et al.  Nitrogen regulatory protein C-controlled genes of Escherichia coli: scavenging as a defense against nitrogen limitation. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Michael Gribskov,et al.  Use of keyword hierarchies to interpret gene expression patterns , 2001, Bioinform..

[5]  A. Valencia,et al.  Mining functional information associated with expression arrays , 2001, Functional & Integrative Genomics.

[6]  B. Finlay,et al.  Host–pathogen interactions: Host resistance factor Nramp1 up-regulates the expression of Salmonella pathogenicity island-2 virulence genes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[8]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[10]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  J. Shea,et al.  Genes encoding putative effector proteins of the type III secretion system of Salmonella pathogenicity island 2 are required for bacterial virulence and proliferation in macrophages , 1998, Molecular microbiology.

[12]  R. Altman,et al.  Using text analysis to identify functionally coherent gene groups. , 2002, Genome research.

[13]  Nir Friedman,et al.  Practical approaches to analyzing results of microarray experiments. , 2002, American journal of respiratory cell and molecular biology.

[14]  G. Dougan,et al.  Genomic Comparison of Salmonella enterica Serovars and Salmonella bongori by Use of an S. enterica Serovar Typhimurium DNA Microarray , 2003, Journal of bacteriology.

[15]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[16]  R. Wilson,et al.  Complete genome sequence of Salmonella enterica serovar Typhimurium LT2 , 2001, Nature.