Bayesian growth curve model useful for high-dimensional longitudinal data

ABSTRACT Traditional inference on the growth curve model (GCM) requires ‘small p large n’ () and cannot be applied in high-dimensional scenarios, where we often encounter singularity. Several methods are proposed to tackle the singularity problem, however there are still limitations and gaps. We consider a Bayesian framework to derive a statistic for testing a linear hypothesis on the GCM. Extensive simulations are performed to investigate performance and establish optimality characteristics. We show that the test overcomes the challenge of high-dimensionality and possesses all the desirable optimality characteristics of a good test - it is unbiased, symmetric and monotone with respect to sample size and departure from the null hypotheses. The results also indicate that the test performs very well, possessing a level close to the nominal value and high power in rejecting small departures from the null. The results also show that the test overcomes limitations of a previously proposed test. We illustrated practical applications using a publicly available time course genetic data on breast cancer, where we used our test statistic for gene filtering. The genes were ranked according to the value of the test statistic and the top five genes were annotated.

[1]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[2]  Arjun K. Gupta,et al.  MATRIX-VARIATE BETA DISTRIBUTION , 2000 .

[3]  Christian Pilarsky,et al.  WIF1, a component of the Wnt pathway, is down‐regulated in prostate, breast, lung, and bladder cancer , 2003, The Journal of pathology.

[4]  Zhiyong Zhang Bayesian growth curve models with the generalized error distribution , 2013 .

[5]  H. Iwase,et al.  Establishment of a standardized gene-expression analysis system using formalin-fixed, paraffin-embedded, breast cancer specimens , 2013, Breast Cancer.

[6]  Howard Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[7]  M. Kuroda,et al.  Expression of G protein‐coupled receptor kinase 4 is associated with breast cancer tumourigenesis , 2008, The Journal of pathology.

[8]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[9]  Wei Wu,et al.  Glucocorticoid receptor activation signals through forkhead transcription factor 3a in breast cancer cells. , 2006, Molecular endocrinology.

[10]  Tena Ipsilantis Katsaounis,et al.  Methods of Multivariate Statistics , 2003, Technometrics.

[11]  Zhiyong Zhang,et al.  Bayesian Inference for Growth Mixture Models with Latent Class Dependent Missing Data , 2011, Multivariate behavioral research.

[12]  O. Kocher,et al.  Differential expression of frpHE: a novel human stromal protein of the secreted frizzled gene family, during the endometrial cycle and malignancy. , 1999, Laboratory investigation; a journal of technical methods and pathology.

[13]  S. HamidJemila,et al.  A Multivariate Growth Curve Model for Ranking Genes in Replicated Time Course Microarray Data , 2009 .

[14]  Z. Oravecz,et al.  Fitting growth curve models in the Bayesian framework , 2018, Psychonomic bulletin & review.

[15]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[16]  Wenxuan Zhong,et al.  A data-driven clustering method for time course gene expression data , 2006, Nucleic acids research.

[17]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[18]  A. James Distributions of Matrix Variates and Latent Roots Derived from Normal Samples , 1964 .

[19]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[20]  Walter C. Hamilton,et al.  The Revolution in Crystallography: Automation and computers have made x-ray structure determination a routine laboratory tool , 1970 .

[21]  Terence P. Speed,et al.  On Gene Ranking Using Replicated Microarray Time Course Data , 2009, Biometrics.

[22]  Arjun K. Gupta,et al.  On generalized matric variate beta distributions , 1985 .

[23]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[24]  Dietrich von Rosen,et al.  High dimensional extension of the growth curve model and its application in genetics , 2017, Stat. Methods Appl..

[25]  B. D. Sivazlian On a Multivariate Extension of the Gamma and Beta Distributions , 1981 .

[26]  Dietrich von Rosen,et al.  The growth curve model: a review , 1991 .

[27]  Zhiyong Zhang,et al.  Modeling error distributions of growth curve models through Bayesian methods , 2015, Behavior Research Methods.

[28]  David Tritchler,et al.  BMC Bioinformatics BioMed Central Methodology article Filtering Genes for Cluster and Network Analysis , 2009 .

[29]  J. Ostrander,et al.  Breast tumor kinase (protein tyrosine kinase 6) regulates heregulin-induced activation of ERK5 and p38 MAP kinases in breast cancer cells. , 2007, Cancer research.

[30]  Ann M. Hess,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Filtering for increased power for microarray data analysis , 2008 .

[31]  L. Chiriboga,et al.  Sox10: A Pan-Schwannian and Melanocytic Marker , 2008, The American journal of surgical pathology.

[32]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[33]  Andrew M. Kuhn,et al.  Growth Curve Models and Statistical Diagnostics , 2003, Technometrics.

[34]  T. Speed,et al.  A multivariate empirical Bayes statistic for replicated microarray time course data , 2006, math/0702685.

[35]  D. Rosen Maximum likelihood estimators in multivariate linear normal models , 1989 .

[36]  R. Potthoff,et al.  A generalized multivariate analysis of variance model useful especially for growth curve problems , 1964 .

[37]  I. Johnstone,et al.  Statistical challenges of high-dimensional data , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[38]  Dietrich von Rosen,et al.  A novel trace test for the mean parameters in a multivariate growth curve model , 2011, J. Multivar. Anal..

[39]  G. M. Kaufman,et al.  Bayesian Analysis of the Independent Multi-Normal Process--Neither Mean Nor Precision Known , 2011 .

[40]  W. C. Hamilton,et al.  The revolution in crystallography. , 1970, Science.

[41]  Peter T. Simpson,et al.  Gene expression profiling of tumour epithelial and stromal compartments during breast cancer progression , 2012, Breast Cancer Research and Treatment.

[42]  J. Beyene,et al.  A multivariate growth curve model for ranking genes in replicated time course microarray data. , 2009, Statistical applications in genetics and molecular biology.

[43]  N. J. Princeton Über die analytische Theorie der quadratischen Formen II , 1966 .

[44]  W. Tan Note on the Multivariate and the Generalized Multivariate Beta Distributions , 1969 .

[45]  C. G. Khatri,et al.  A note on a manova model applied to problems in growth curve , 1966 .

[46]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[47]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .