Examining the relative influence of familial, genetic, and environmental covariate information in flexible risk models

We present a method for examining the relative influence of familial, genetic, and environmental covariate information in flexible nonparametric risk models. Our goal is investigating the relative importance of these three sources of information as they are associated with a particular outcome. To that end, we developed a method for incorporating arbitrary pedigree information in a smoothing spline ANOVA (SS-ANOVA) model. By expressing pedigree data as a positive semidefinite kernel matrix, the SS-ANOVA model is able to estimate a log-odds ratio as a multicomponent function of several variables: one or more functional components representing information from environmental covariates and/or genetic marker data and another representing pedigree relationships. We report a case study on models for retinal pigmentary abnormalities in the Beaver Dam Eye Study. Our model verifies known facts about the epidemiology of this eye lesion—found in eyes with early age-related macular degeneration—and shows significantly increased predictive ability in models that include all three of the genetic, environmental, and familial data sources. The case study also shows that models that contain only two of these data sources, that is, pedigree-environmental covariates, or pedigree-genetic markers, or environmental covariates-genetic markers, have comparable predictive ability, but less than the model with all three. This result is consistent with the notions that genetic marker data encode—at least in part—pedigree data, and that familial correlations encode shared environment data as well.

[1]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[2]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[3]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[4]  R. T. Smith,et al.  A common haplotype in the complement regulatory gene factor H (HF1/CFH) predisposes individuals to age-related macular degeneration. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  G. Malécot,et al.  Les mathématiques de l'hérédité , 1948 .

[6]  Ujjwal Maulik,et al.  Advanced Methods for Knowledge Discovery from Complex Data , 2005 .

[7]  R. Klein,et al.  Ten-year incidence and progression of age-related maculopathy: The Beaver Dam eye study. , 2002, Ophthalmology.

[8]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[9]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[10]  R. Klein,et al.  The Beaver Dam Eye Study. Retinopathy in adults with newly discovered and previously diagnosed diabetes mellitus. , 1992, Ophthalmology.

[11]  R. Klein,et al.  Are sex hormones associated with age-related maculopathy in women? The Beaver Dam Eye Study. , 1994, Transactions of the American Ophthalmological Society.

[12]  R. Klein,et al.  Complement factor H and hemicentin-1 in age-related macular degeneration and renal phenotypes. , 2007, Human molecular genetics.

[13]  R. Klein,et al.  Familial aggregation of retinal vessel caliber in the beaver dam eye study. , 2004, Investigative ophthalmology & visual science.

[14]  Stephen J. Wright,et al.  Framework for kernel regularization with application to protein clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  A. Edwards,et al.  Complement Factor H Polymorphism and Age-Related Macular Degeneration , 2005, Science.

[16]  S. Fisher,et al.  Age-related macular degeneration is associated with an unstable ARMS2 (LOC387715) mRNA , 2008, Nature Genetics.

[17]  Stephen J. Wright,et al.  Dissimilarity in Graph-Based Semi-Supervised Classification , 2007, AISTATS.

[18]  R. Klein,et al.  Genetics of pigment changes and geographic atrophy. , 2007, Investigative ophthalmology & visual science.

[19]  T. Sellers Statistical Methods in Genetic Epidemiology , 2005 .

[20]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[21]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[22]  R. Klein,et al.  Prevalence of age-related maculopathy. The Beaver Dam Eye Study. , 1992, Ophthalmology.

[23]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[24]  R. Klein,et al.  The five-year incidence and progression of age-related maculopathy: the Beaver Dam Eye Study. , 1997, Ophthalmology.

[25]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[26]  Yun Li,et al.  CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age-related macular degeneration , 2006, Nature Genetics.

[27]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[28]  G. Wahba,et al.  A GENERALIZED APPROXIMATE CROSS VALIDATION FOR SMOOTHING SPLINES WITH NON-GAUSSIAN DATA , 1996 .

[29]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[30]  B. Borchers A C library for semidefinite programming , 1999 .

[31]  Xiwu Lin,et al.  Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV , 2000 .

[32]  Chong Gu Smoothing Spline Anova Models , 2002 .

[33]  R. Guymer,et al.  Analysis of the Y402H variant of the complement factor H gene in age-related macular degeneration. , 2006, Investigative ophthalmology & visual science.

[34]  Ronald Klein,et al.  The epidemiology of age-related macular degeneration. , 2004, American journal of ophthalmology.

[35]  Gonçalo R. Abecasis,et al.  A variant of mitochondrial protein LOC387715/ARMS2, not HTRA1, is strongly associated with age-related macular degeneration , 2007, Proceedings of the National Academy of Sciences.

[36]  J. Gilbert,et al.  Complement Factor H Variant Increases the Risk of Age-Related Macular Degeneration , 2005, Science.

[37]  Jian Ge,et al.  CFH Y402H Confers Similar Risk of Soft Drusen and Both Forms of Advanced AMD , 2005, PLoS medicine.

[38]  Chong Gu Diagnostics for Nonparametric Regression Models with Additive Terms , 1992 .

[39]  Usha Chakravarthy,et al.  Prevalence of age related maculopathy in northern India , 2004 .

[40]  Ronald Klein,et al.  Fifteen-year cumulative incidence of age-related macular degeneration: the Beaver Dam Eye Study. , 2007, Ophthalmology.

[41]  B. Borchers CSDP, A C library for semidefinite programming , 1999 .

[42]  G. Abecasis,et al.  Meta-analysis of genome scans of age-related macular degeneration. , 2005, Human molecular genetics.

[43]  R. Klein,et al.  The Beaver Dam Eye Study: visual acuity. , 1991, Ophthalmology.

[44]  Wei Chu,et al.  Relational Learning with Gaussian Processes , 2006, NIPS.

[45]  G. Wahba,et al.  Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture , 1995 .

[46]  B S Hawkins,et al.  Epidemiology of age-related macular degeneration. , 1999, Molecular vision.