Explaining the Genetic Basis of Complex Quantitative Traits through Prediction Models

The functional characterization of genes involved in many complex traits (phenotypes) of plants, animals, or humans can be studied from a computational point of view using different tools. We propose prediction--from the machine learning point of view--to search for the genetic basis of these traits. However, trying to predict an exact value of a phenotype can be too difficult to obtain a confident model, but predicting an approximation, in the form of an interval of values, can be easier. We shall see that trustable and useful models can be obtained from this relaxed formulation. These predictors may be built as extensions of conventional classifiers or regressors. Although the prediction performance in both cases are similar, we show that, from the classification field, it is straightforward to obtain a principled and scalable method to select a reduced set of features in these genetic learning tasks. We conclude by comparing the results so achieved in a real-world data set of barley plants with those obtained with state-of-the-art methods used in the biological literature.

[1]  Peter M Visscher,et al.  Prediction of individual genetic risk to disease from genome-wide association studies. , 2007, Genome research.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[4]  Peter Wenzl,et al.  A high-density consensus map of barley linking DArT markers to SSR, RFLP and STS loci and agricultural traits , 2006, BMC Genomics.

[5]  Sang Hong Lee,et al.  Predicting Unobserved Phenotypes for Complex Traits from Whole-Genome SNP Data , 2008, PLoS genetics.

[6]  Juan José del Coz,et al.  Learning to Predict One or More Ranks in Ordinal Regression Tasks , 2008, ECML/PKDD.

[7]  Adam Kowalczyk,et al.  Precision-mapping and statistical validation of quantitative trait loci by machine learning , 2008, BMC Genetics.

[8]  S. Leal Genetics and Analysis of Quantitative Traits , 2001 .

[9]  Marco Zaffalon,et al.  Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2 , 2008, J. Mach. Learn. Res..

[10]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[11]  R. Mauricio Mapping quantitative trait loci in plants: uses and caveats for evolutionary biology , 2001, Nature Reviews Genetics.

[12]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[13]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[14]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[15]  Juan José del Coz,et al.  Learning Nondeterministic Classifiers , 2009, J. Mach. Learn. Res..