Detecting population structures by independent component analysis

In genome-wide association studies (GWAS), detecting population stratification is one of main isuues. As population stratification can produce spurious associations in GWAS, it is necessary to find hidden structures and assign individuals to subpopulations in advance. We suggest an exploratory approach for population structure analysis based on independent component analysis (ICA). ICA is mainly used for blind source separation, which attempts to distinguish individual signals in situations where multiple signals are mixed. It can treat non-Gaussian data and use higher moments. To determine the population structure, we first reduce the dimensionality of samples by projecting the data to a lower-dimensional subspace built by ICA. The samples are then bisected using fuzzy clustering. Repeating this procedure until some predetermined stopping criterion, we can detect the population structure and assign individuals to subpopulations. Information about the number of optimal subpopulations can also be obtained. To assess the proposed method, we analyze simulated genotypic data.

[1]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[2]  Andrzej Cichocki,et al.  Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications , 2002 .

[3]  Aapo Hyvärinen,et al.  Independent component analysis: recent advances , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[4]  Tzyy-Ping Jung,et al.  Noninvasive Study of the Human Heart using Independent Component Analysis , 2006, Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06).

[5]  K Alaine Broadaway,et al.  Stratification‐Score Matching Improves Correction for Confounding by Population Stratification in Case‐Control Association Studies , 2012, Genetic epidemiology.

[6]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[7]  Nianjun Liu,et al.  PSMIX: an R package for population structure inference via maximum likelihood method , 2006, BMC Bioinformatics.

[8]  Mark D Shriver,et al.  Measuring European population stratification with microarray genotype data. , 2007, American journal of human genetics.

[9]  Jukka Corander,et al.  Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations , 2008, BMC Bioinformatics.

[10]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[11]  Weina Wang,et al.  On fuzzy cluster validity indices , 2007, Fuzzy Sets Syst..

[12]  Anunchai Assawamakin,et al.  Iterative pruning PCA improves resolution of highly structured populations , 2009, BMC Bioinformatics.

[13]  Xinghua Sun,et al.  A Novel ICA-Based Image/Video Processing Method , 2007, ICA.

[14]  Ganesh R. Naik,et al.  An Overview of Independent Component Analysis and Its Applications , 2011, Informatica.

[15]  Tapani Ristaniemi,et al.  Delay Estimation in CDMA Communications Using A FastICA Algorithm , 2000 .

[16]  Ravindra Khattree,et al.  Analysis of Multivariate and High‐Dimensional Data , 2015 .

[17]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[18]  Rainer Goebel,et al.  Classification of fMRI independent components using IC-fingerprints and support vector machine classifiers , 2007, NeuroImage.

[19]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[20]  C. Richards,et al.  Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates , 2009, PloS one.

[21]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[22]  N. Risch,et al.  Estimation of individual admixture: Analytical and study design considerations , 2005, Genetic epidemiology.

[23]  Gonçalo R. Abecasis,et al.  GENOME: a rapid coalescent-based whole genome simulator , 2007, Bioinform..

[24]  Akio Utsugi,et al.  Removal of artifacts and fluctuations from MEG data by clustering methods , 2004, Neurocomputing.

[25]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.