Q ( QVHPEOH 5 DQN / HDUQLQJ $ SSURDFK IRU * HQH 3 ULRULWL ] DWLRQ

Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.

[1]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[2]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[3]  Bart De Moor,et al.  Candidate gene prioritization by network analysis of differential expression using machine learning approaches , 2010, BMC Bioinformatics.

[4]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[5]  Hong Zhao,et al.  PGDB: a curated and integrated database of genes related to the prostate , 2003, Nucleic Acids Res..

[6]  Bart De Moor,et al.  A guide to web tools to prioritize candidate genes , 2011, Briefings Bioinform..

[7]  Ali Masoudi-Nejad,et al.  RETRACTED ARTICLE: Candidate gene prioritization , 2012, Molecular Genetics and Genomics.

[8]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[9]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[10]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[11]  N. Risch Searching for genetic determinants in the new millennium , 2000, Nature.

[12]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[13]  Michael J. Lush,et al.  genenames.org: the HGNC resources in 2011 , 2010, Nucleic Acids Res..

[14]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[15]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[16]  Jing Chen,et al.  Disease candidate gene identification and prioritization using protein interaction networks , 2009, BMC Bioinformatics.

[17]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[18]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.