FLPCA: A Fused Lasso PCA-based approach to identify influential markers in differentiated populations from dense SNP data

Detection of footprints of selection has been a great research interest in population genetics over the past few years, both in Human and Animal populations. In this work we present two methodological improvements to increase the accuracy of detection of selection signature. First, we show how Principal Components Analysis (PCA) and Between-Groups Analysis (BGA), which are very computationally efficient methods to explore large SNP data sets and to characterize population genetic structures, can provide SNP typological values that are related to F-statistics. In a second step, we propose to use the fused Lasso approach to identify significant footprints of selection, taking into account the spatial organization of the SNPs along the chromosomes. Indeed, previously proposed methods are often based on empirical smoothing approaches and until now no clear recommendation was available for the choice of significant threshold for SNP detection. A simulation study both under a neutral model and under selection was performed to evaluate the performance of the proposed method in terms of detection power and false positive level. As an illustration of the approach, we analyzed human haplotypes sampled from three HapMap populations, and bovine data obtained from a 800K SNP chip.