Floating Feature Selection for multiloci association of quantitative traits in sib-pairs analysis

Finding association between genotypic differences and disease traits has become one of the main objectives in current genetic research. It has been published that some of the underlying factors in the dynamics of the coagulation process have a genetic compound, showing significant hereditability. This is the case of the Factor VII. In this work, we propose a method for selecting sets of Single Nucleotide Polymorphisms (SNPs) of the F7 gene that are significantly related with the phenotype (Factor VII levels). The methodology is applied to the sib pairs from the GAIT project sample. The method consists of an adapted Sequential Floating Feature Selection (SFFS) algorithm. This algorithm is applied with two relevance criteria, one linear and one non linear. The SNPs sets found with linear models are included in the sets found with non linear techniques. The results fit in with previous results in clinical area.

[1]  A. Hamsten,et al.  Two common functional polymorphisms in the promoter region of the coagulation factor VII gene determining plasma factor VII activity and mass concentration. , 1999, Blood.

[2]  L. Almasy,et al.  Genetic susceptibility to thrombosis and its relationship to physiological risk factors: the GAIT study. Genetic Analysis of Idiopathic Thrombophilia. , 2000, American journal of human genetics.

[3]  Juan Carlos Souto,et al.  The F7 Gene and Clotting Factor VII Levels: Dissection of a Human Quantitative Trait Locus , 2006, Human biology.

[4]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[5]  A. Buil,et al.  SNP sets selection under mutual information criterion, application to F7/FVII dataset , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[6]  M. Ng,et al.  Informative Gene Discovery for Cancer Classification from Microarray Expression Data , 2005, 2005 IEEE Workshop on Machine Learning for Signal Processing.

[7]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[8]  J. Fontcuberta,et al.  Complexity of the genetic contribution to factor VII deficiency in two Spanish families: clinical and biological implications. , 2003, Haematologica.

[9]  E. Lander,et al.  Complete multipoint sib-pair analysis of qualitative and quantitative traits. , 1995, American journal of human genetics.

[10]  D. Bishop,et al.  The power of identity-by-state methods for linkage analysis. , 1990, American journal of human genetics.

[11]  J. Hagenauer,et al.  Gene mapping of complex diseases - A comparison of methods from statistics informnation theory, and signal processing , 2007, IEEE Signal Processing Magazine.

[12]  Sorin Istrail,et al.  Optimal Selection of SNP Markers for Disease Association Studies , 2005, Human Heredity.

[13]  D. Girelli,et al.  Polymorphisms in the factor VII gene and the risk of myocardial infarction in patients with coronary artery disease. , 2000, The New England journal of medicine.

[14]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[15]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[16]  Alex Zelikovsky,et al.  MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression , 2006, Bioinform..

[17]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  D E Weeks,et al.  A multilocus extension of the affected-pedigree-member method of linkage analysis. , 1992, American journal of human genetics.