Background: Hypertension is a prevalent condition linked to major cardiovascular conditions and multiple other comorbidities. Genetic information can offer a deeper understanding about susceptibility and the underlying disease mechanisms. The Genetic Analysis Workshop 18 (GAW18) provides abundant genotype data to determine genetic associations for being hypertensive and for the underlying trait of systolic blood pressure (SBP). The high-dimensional nature of this data promotes dimension reduction techniques to remove excess noise and also synthesize genetic information for complex, polygenic traits. Methods: For both measured and simulated phenotype data from GAW18, we use sparse principal component analysis to obtain sparse genetic profiles that represent the underlying data structures. We then detect associations between the obtained sparse principal components (PCs) and SBP, a major indicator of hypertension, following up by investigating the sparse PCs for genetic structure to gain insight into new patterns. Results: After adjusting for multiple testing, 27 of 122 PCs were significantly associated with measured SBP, offering a large number of components to investigate. Considering the top 3 PCs, linked genetic regions have been identified; these may act in unison while associated with SBP. Simulated data offered similar results. Conclusions: Sparse PCs can offer a new data-driven approach to structuring genotype data and understanding the genetic mechanics behind complex, polygenic traits such as hypertension.
[1]
R. Tibshirani,et al.
Sparse Principal Component Analysis
,
2006
.
[2]
Woojoo Lee,et al.
Super-sparse principal component analyses for high-throughput genomic data
,
2010,
BMC Bioinformatics.
[3]
Ashley J. Bonner.
Sparse Principal Component Analysis for High-Dimensional Data: A Comparative Study
,
2012
.
[4]
R. Tibshirani,et al.
A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.
,
2009,
Biostatistics.
[5]
J. Beyene,et al.
Potential risk factors associated with human encephalitis: application of canonical correlation analysis
,
2011,
BMC medical research methodology.
[6]
Takashi Okada,et al.
Covariance and PCA for Categorical Variables
,
2005,
PAKDD.
[7]
D. Mozaffarian,et al.
Heart disease and stroke statistics--2012 update: a report from the American Heart Association.
,
2012,
Circulation.
[8]
H. Zou,et al.
Regularization and variable selection via the elastic net
,
2005
.
[9]
K. Tu,et al.
Prevalence and incidence of hypertension from 1995 to 2005: a population-based study
,
2008,
Canadian Medical Association Journal.
[10]
S. Oparil,et al.
Essential Hypertension : Part I : Definition and Etiology
,
1999
.