Learning classifiers from discretized expression quantitative trait loci

Expression quantitative trait loci are used as a tool to iden- tify genetic causes of natural variation in gene expression. Only in a few cases the expression of a gene is controlled by a variant on a single marker. There is a plethora of dierent complexity levels of interaction ef- fects within markers, within genes and between marker and genes. This complexity challenges biostatisticians and bioinformatitians every day and makes ndings dicult to appear. As a way to simplify analysis and better control confounders, we tried a new approach for associa- tion analysis between genotypes and expression data. We pursued to understand whether discretization of expression data can be useful in genome-transcriptome association analyses. By discretizing the depen- dent variable, algorithms for learning classiers from data as well as performing block selection were used to help understanding the relation- ship between the expression of a gene and genetic markers. We present the results of a rst set of studies in which we used this approach to de- tect new possible causes of expression variation of DRB5, a gene playing an important role within the immune system. A supplementary website including a link to the software with the method implemented can be found at http://bios.ugr.es/classDRB5.

[1]  C. Pinilla,et al.  Myelin Basic Protein-Specific TCR/HLA-DRB5*01:01 Transgenic Mice Support the Etiologic Role of DRB5*01:01 in Multiple Sclerosis , 2012, The Journal of Immunology.

[2]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[3]  John A. Todd,et al.  Statistical colocalization of monocyte gene expression and genetic risk variants for type 1 diabetes , 2012, Human molecular genetics.

[4]  Vineet Bafna,et al.  Sample Reproducibility of Genetic Association Using Different Multimarker TDTs in Genome-Wide Association Studies: Characterization and a New Approach , 2012, PloS one.

[5]  María M. Abad-Grau,et al.  Multiple Sclerosis Risk Variant HLA-DRB1*1501 Associates with High Expression of DRB1 Gene in Different Human Populations , 2012, PloS one.

[6]  J. Allison,et al.  Strength of TCR–Peptide/MHC Interactions and In Vivo T Cell Responses , 2011, The Journal of Immunology.

[7]  R. Shamir,et al.  Understanding Gene Sequence Variation in the Context of Transcription Regulation in Yeast , 2010, PLoS genetics.

[8]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[11]  John D. Storey,et al.  Mapping the Genetic Architecture of Gene Expression in Human Liver , 2008, PLoS biology.

[12]  L. Liang,et al.  A genome-wide association study of global gene expression , 2007, Nature Genetics.

[13]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[14]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[15]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[16]  J. Downing,et al.  Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. , 2003, Blood.

[17]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[18]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[19]  S. Sathiya Keerthi,et al.  Improvements to the SMO algorithm for SVM regression , 2000, IEEE Trans. Neural Networks Learn. Syst..

[20]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[21]  Tony R. Martinez,et al.  Bias and the probability of generalization , 1997, Proceedings Intelligent Information Systems. IIS'97.

[22]  J. Eliaou,et al.  Quantitative analysis of the expression of the HLA-DRB genes at the transcriptional level by competitive polymerase chain reaction. , 1996, Journal of immunology.

[23]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[24]  J. Hillert,et al.  The multiple sclerosis- and narcolepsy-associated HLA class II haplotype includes the DRB5*0101 allele. , 1995, Tissue antigens.

[25]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .