A Comparison of Algorithms to Find Differentially Expressed Genes in Microarray Data

There are several different algorithms published for the identification of differentially expressed genes in DNA microarray experiments. Such algorithms produce ordered lists of genes. To compare the performance of these algorithms established measurements from Information Retrieval are proposed. A benchmark data set with known properties is generated and published. This benchmark data is used to compare the performance of different algorithms with a new algorithm, called PUL. Surprisingly a clear ordering in performance of the algorithms was observed. PUL outperformed other algorithms by a factor of two. PUL was applied successfully in different practical applications. For these experiments the importance of the genes identified by PUL were independently verified.

[1]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[2]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[3]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[4]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[5]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[6]  A. Ultsch,et al.  Targeting lipid metabolism by the lipoprotein lipase inhibitor orlistat results in apoptosis of B-cell chronic lymphocytic leukemia cells , 2008, Leukemia.

[7]  Alfred Ultsch,et al.  Pareto Density Estimation: A Density Estimation for Knowledge Discovery , 2005 .

[8]  Holger Christiansen,et al.  Loss of a FYN-regulated differentiation and growth arrest pathway in advanced stage neuroblastoma. , 2002, Cancer cell.

[9]  Alfred Ultsch,et al.  Improving the Identification of Differentially Expressed Genes in cDNA Microarray Experiments , 2004, GfKl.

[10]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[11]  Johannes Beckers,et al.  Identification and validation of novel ERBB2 (HER2, NEU) targets including genes involved in angiogenesis , 2005, International journal of cancer.

[12]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[13]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.