Improving the Identification of Phenotypic Abnormalities and Sexual Dimorphism in Mice When Studying Rare Event Categorical Characteristics

Biological research frequently involves the study of phenotyping data. Many of these studies focus on rare event categorical data, and functional genomics studies typically study the presence or absence of an abnormal phenotype. With the growing interest in the role of sex, there is a need to assess the phenotype for sexual dimorphism. The identification of abnormal phenotypes for downstream research is challenged by the small sample size, the rare event nature, and the multiple testing problem, as many variables are monitored simultaneously. Here, we develop a statistical pipeline to assess statistical and biological significance while managing the multiple testing problem. We propose a two-step pipeline to initially assess for a treatment effect, in our case example genotype, and then test for an interaction with sex. We compare multiple statistical methods and use simulations to investigate the control of the type-one error rate and power. To maximize the power while addressing the multiple testing issue, we implement filters to remove data sets where the hypotheses to be tested cannot achieve significance. A motivating case study utilizing a large scale high-throughput mouse phenotyping data set from the Wellcome Trust Sanger Institute Mouse Genetics Project, where the treatment is a gene ablation, demonstrates the benefits of the new pipeline on the downstream biological calls.

[1]  Adel Javanmard,et al.  Online Rules for Control of False Discovery Rate and False Discovery Exceedance , 2016, ArXiv.

[2]  Eric Chicken,et al.  Nonparametric Statistical Methods: Hollander/Nonparametric Statistical Methods , 1973 .

[3]  Terrence F. Meehan,et al.  PhenStat: A Tool Kit for Standardized Analysis of High Throughput Phenotypic Data , 2015, PloS one.

[4]  S. Rosset,et al.  Generalized α‐investing: definitions, optimality results and application to public databases , 2014 .

[5]  K. Flanagan Sexual dimorphism in biomedical research: a call to analyse by sex. , 2014, Transactions of the Royal Society of Tropical Medicine and Hygiene.

[6]  R. Staudte,et al.  Better than you think: Interval estimators of the difference of binomial proportions , 2014 .

[7]  T. Woodruff Sex, equity, and science , 2014, Proceedings of the National Academy of Sciences.

[8]  Yoav Benjamini,et al.  Selective inference on multiple families of hypotheses , 2014 .

[9]  Gautier Koscielny,et al.  The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data , 2013, Nucleic Acids Res..

[10]  Damian Smedley,et al.  Genome-wide Generation and Systematic Phenotyping of Knockout Mice Reveals New Roles for Many Genes , 2013, Cell.

[11]  Natasha A. Karp,et al.  Robust and Sensitive Analysis of Mouse Knockout Phenotypes , 2012, PloS one.

[12]  Mark W. Moore,et al.  Towards an encyclopaedia of mammalian gene function: the International Mouse Phenotyping Consortium , 2012, Disease Models & Mechanisms.

[13]  Joseph F. Heyse,et al.  A False Discovery Rate Procedure for Categorical Data , 2011 .

[14]  Teresa K. Woodruff,et al.  Sex bias in trials and treatment must end , 2010, Nature.

[15]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[16]  F Thiessard,et al.  False Discovery Rate Estimation for Frequentist Pharmacovigilance Signal Detection Methods , 2010, Biometrics.

[17]  Alex Lewin,et al.  On fuzzy familywise error rate and false discovery rate procedures for discrete distributions , 2009 .

[18]  Yoav Gilad,et al.  Sex-specific genetic architecture of human disease , 2008, Nature Reviews Genetics.

[19]  Georg Heinze,et al.  A comparative investigation of methods for logistic regression with separated or nearly separated data , 2006, Statistics in medicine.

[20]  A. Arnold,et al.  Tissue-specific expression and regulation of sexually dimorphic genes in mice. , 2006, Genome research.

[21]  Y. Benjamini,et al.  False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters , 2005 .

[22]  Peter B. Gilbert,et al.  A modified false discovery rate multiple‐comparisons procedure for discrete data, applied to human immunodeficiency virus genetics , 2005 .

[23]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[24]  M. Schemper,et al.  A solution to the problem of separation in logistic regression , 2002, Statistics in medicine.

[25]  Ming-Chung Yang,et al.  AN OPTIMALITY THEORY FOR MID p-VALUES IN 2 × 2 CONTINGENCY TABLES , 2001 .

[26]  R. Newcombe,et al.  Interval estimation for the difference between independent proportions: comparison of eleven methods. , 1998, Statistics in medicine.

[27]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[28]  Tarone Re A modified Bonferroni method for discrete data. , 1990 .

[29]  R. Tarone,et al.  A modified Bonferroni method for discrete data. , 1990, Biometrics.

[30]  S Greenland,et al.  Tests for interaction in epidemiologic studies: a review and a study of power. , 1983, Statistics in medicine.

[31]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .

[32]  Marvin Zelen,et al.  The analysis of several 2× 2 contingency tables , 1971 .

[33]  M. Graffar [Modern epidemiology]. , 1971, Bruxelles medical.

[34]  H. O. Lancaster,et al.  Significance Tests in Discrete Distributions , 1961 .