The SgenoLasso and its cousins for selective genotyping and extreme sampling: application to association studies and genomic selection

We introduce a new variable selection method, called SgLasso, that handles extreme data, and suitable when the correlation between regressors is known. It is appropriate in genomics since once the genetic map has been built, the correlation is perfectly known. Besides, we prove that the signal to noise ratio is largely increased by considering the extremes. Our method relies on the construction of a specific statistical test, a transformation of the data and by the knowledge of the correlation between regressors. This new technique is inspired by stochastic processes arising from statistical genetics. Our approach and existing methods are compared for simulated and real data, and the results point to the validity of our approach.