ZODET: Software for the Identification, Analysis and Visualisation of Outlier Genes in Microarray Expression Data

Summary Complex human diseases can show significant heterogeneity between patients with the same phenotypic disorder. An outlier detection strategy was developed to identify variants at the level of gene transcription that are of potential biological and phenotypic importance. Here we describe a graphical software package (z-score outlier detection (ZODET)) that enables identification and visualisation of gross abnormalities in gene expression (outliers) in individuals, using whole genome microarray data. Mean and standard deviation of expression in a healthy control cohort is used to detect both over and under-expressed probes in individual test subjects. We compared the potential of ZODET to detect outlier genes in gene expression datasets with a previously described statistical method, gene tissue index (GTI), using a simulated expression dataset and a publicly available monocyte-derived macrophage microarray dataset. Taken together, these results support ZODET as a novel approach to identify outlier genes of potential pathogenic relevance in complex human diseases. The algorithm is implemented using R packages and Java. Availability The software is freely available from http://www.ucl.ac.uk/medicine/molecular-medicine/publications/microarray-outlier-analysis.

[1]  Andrew Collins,et al.  TFG, a target of chromosome translocations in lymphoma and soft tissue tumors, fuses to GPR128 in healthy individuals , 2010, Haematologica.

[2]  Leland Wilkinson,et al.  An Analytic Approximation to the Distribution of Lilliefors's Test Statistic for Normality , 1986 .

[3]  F. Pociot,et al.  Variation in antiviral 2',5'-oligoadenylate synthetase (2'5'AS) enzyme activity is controlled by a single-nucleotide polymorphism at a splice-acceptor site in the OAS1 gene. , 2005, American journal of human genetics.

[4]  James J. Cai,et al.  Genetic Variants Contribute to Gene Expression Variability in Humans , 2013, Genetics.

[5]  Debashis Ghosh,et al.  COPA - cancer outlier profile analysis , 2006, Bioinform..

[6]  R. Tibshirani,et al.  Outlier sums for differential gene expression analysis. , 2007, Biostatistics.

[7]  Michael A. Charleston,et al.  Differential variability analysis of gene expression and its application to human diseases , 2008, ISMB.

[8]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Baolin Wu,et al.  Cancer outlier differential gene expression detection. , 2007, Biostatistics.

[10]  E. Dermitzakis,et al.  Rare and Common Regulatory Variation in Population-Scale Sequenced Human Genomes , 2011, PLoS genetics.

[11]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[12]  Sami Kilpinen,et al.  GTI: A Novel Algorithm for Identifying Outlier Gene Expression Profiles from Integrated Microarray Datasets , 2011, PloS one.

[13]  John Quackenbush,et al.  Variance of Gene Expression Identifies Altered Network Constraints in Neurological Disease , 2011, PLoS genetics.

[14]  Yu Liu,et al.  Gene Expression Variability within and between Human Populations and Implications toward Disease Susceptibility , 2010, PLoS Comput. Biol..

[15]  Robert Tibshirani,et al.  Statistical methods for identifying differentially expressed genes in DNA microarrays. , 2003, Methods in molecular biology.

[16]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[17]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.