A Comparison of Parametric and Semi-Parametric Models for Microarray Data Analysis

Microarray technology has revolutionized genomic studies by enabling the study of differential expression of thousands of genes simultaneously. Parametric, nonparametric and semi-parametric statistical methods have been proposed for gene selection within the last sixteen years. In an effort to find the “gold standard", the performance of some common parametric and nonparametric methods have been compared in terms of power to select differentially expressed genes and other desirable properties. However, no such comparisons have been conducted between parametric and semi-parametric models. In this study, we compared a semi-parametric model based on copulas with a parametric model (the quantitative trait analysis or QTA model) in terms of power and the ability to control the Type I error rate. In addition, we proposed a simple algorithm for choosing an optimal copula. The two approaches were applied to a publicly available melanoma cell lines dataset for validation. Both methods performed well in terms of power but the copula approach was notably the better. In terms of the Type I error rate control, the two methods were comparable. More methods for selecting an optimal copula for gene expression data need to be developed, as the proposed procedure is limited to copulas that permit both negative and positive dependence only.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Giovanni Parmigiani,et al.  A Bayesian Model for Cross-Study Differential Gene Expression , 2009, Journal of the American Statistical Association.

[3]  Charles Rotimi,et al.  Gene Copy Number Analysis for Family Data Using Semiparametric Copula Model , 2008, Bioinformatics and biology insights.

[4]  P. Sen,et al.  A Copula Approach for Detecting Prognostic Genes Associated With Survival Outcome in Microarray Studies , 2007, Biometrics.

[5]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[7]  Wei Pan,et al.  On the Use of Permutation in and the Performance of A Class of Nonparametric Methods to Detect Differential Gene Expression , 2003, Bioinform..

[8]  Daniel Berg Copula goodness-of-fit testing: an overview and power comparison , 2009 .

[9]  Friedrich Schmid,et al.  A goodness of fit test for copulas based on Rosenblatt's transformation , 2007, Comput. Stat. Data Anal..

[10]  Yingdong Zhao,et al.  Analysis of Gene Expression Data Using BRB-Array Tools , 2007, Cancer informatics.

[11]  J. Ibrahim,et al.  Bayesian Models for Gene Expression With DNA Microarray Data , 2002 .

[12]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[14]  Antai Wang GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS , 2010 .

[15]  H. Joe Asymptotic efficiency of the two-stage estimation method for copula-based models , 2005 .

[16]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[17]  J. Ibrahim,et al.  Defective cell cycle checkpoint functions in melanoma are associated with altered patterns of gene expression. , 2008, The Journal of investigative dermatology.

[18]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[19]  M. Tyers,et al.  Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. , 2002, Cancer research.

[20]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[21]  Bruno Rémillard,et al.  Goodness‐of‐fit Procedures for Copula Models Based on the Probability Integral Transformation , 2006 .

[22]  H. Akaike A new look at the statistical model identification , 1974 .

[23]  R. Simon,et al.  Controlling the number of false discoveries: application to high-dimensional genomic data , 2004 .

[24]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[25]  S. Dhanasekaran,et al.  Delineation of prognostic biomarkers in prostate cancer , 2001, Nature.

[26]  Jean-David Fermanian,et al.  Goodness-of-fit tests for copulas , 2005 .

[27]  C. Genest,et al.  A semiparametric estimation procedure of dependence parameters in multivariate families of distributions , 1995 .

[28]  B. Rémillard,et al.  Goodness-of-fit tests for copulas: A review and a power study , 2006 .

[29]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[30]  Jong-Min Kim,et al.  Directional Dependence of Genes Using Survival Truncated FGM Type Modification Copulas , 2009, Commun. Stat. Simul. Comput..

[31]  Wei Pan,et al.  A mixture model approach to detecting differentially expressed genes with microarray data , 2003, Functional & Integrative Genomics.

[32]  Insuk Sohn,et al.  BMC Bioinformatics BioMed Central Methodology article A copula method for modeling directional dependence of genes , 2022 .

[33]  B. Omolo,et al.  Mechanisms of chromosomal instability in melanoma , 2014, Environmental and molecular mutagenesis.

[34]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[35]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..