论文信息 - Multi-view Ensemble Classification for Clinically Actionable Genetic Mutations - 字舞流文

Multi-view Ensemble Classification for Clinically Actionable Genetic Mutations

This paper presents details of our solutions to the task IV of NIPS 2017 Competition Track that is called Classifying Clinically Actionable Genetic Mutations. It aims at classifying genetic mutations based on text evidence from clinical literature. A novel multi-view machine learning framework with ensemble classification models is proposed to solve this problem. During this Challenge, feature combinations deriving from three views including document view, entity text view, and entity name view to complement each other are comprehensively explored. Finally, an ensemble of 9 basic gradient boosting models win in the comparisons. Our approach scored 0.5506 and 0.6694 in Logarithmic Loss on a fixed split of stage-1 testing phase and 5-fold cross validation respectively, which is ranked as a top-3 team in NIPS 2017 Competition Track IV.

Fei Wang | Xi Zhang | Xu Min | Chang Su | Sendong Zhao | Chao Che | Dandi Chen | Yongjun Zhu | Fei Wang | Sendong Zhao | Yongjun Zhu | Xi Sheryl Zhang | Dandi Chen | Chao Che | Chang Su | X. Min

[1] Nanyun Peng,et al. Cross-Sentence N-ary Relation Extraction with Graph LSTMs , 2017, TACL.

[2] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3] Yehuda Koren,et al. Lessons from the Netflix prize challenge , 2007, SKDD.

[4] David A. C. Manning,et al. Introduction to Industrial Minerals , 1994 .

[5] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[6] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[7] Alfred V. Aho,et al. Efficient string matching , 1975, Commun. ACM.

[8] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9] Hung-Yu Kao,et al. Cross-species gene normalization by species inference , 2011, BMC Bioinformatics.

[10] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[11] Hanqing Lu,et al. Fusing multi-modal features for gesture recognition , 2013, ICMI '13.

[12] Zhiyong Lu,et al. tmVar: a text mining approach for extracting sequence variants in biomedical literature , 2013, Bioinform..

[13] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[16] Marc'Aurelio Ranzato,et al. Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews , 2014, ICLR.

[17] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[18] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[19] Xiaoyan Zhu,et al. GeneTUKit: a software for document-level gene normalization , 2011, Bioinform..

[20] S. Lowe,et al. A microRNA polycistron as a potential human oncogene , 2005, Nature.

[21] Zhiyong Lu,et al. PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[22] Zellig S. Harris,et al. Distributional Structure , 1954 .

[23] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.