论文信息 - 단백체 스펙트럼 데이터의 분류를 위한 랜덤 포리스트 기반 특성 선택 알고리즘

단백체 스펙트럼 데이터의 분류를 위한 랜덤 포리스트 기반 특성 선택 알고리즘

This paper proposes a novel method for feature selection for mass spectrometric proteomic data based on Random Forest. The method includes an effective preprocessing step to filter a large amount of redundant features with high correlation and applies a tournament strategy to get an optimal feature subset. Experiments on three public datasets, Ovarian 4-3-02, Ovarian 7-8-02 and Prostate shows that the new method achieves high performance comparing with widely used methods and balanced rate of specificity and sensitivity.

지승도 | 온승엽 | 한미영

[1] Robert Tibshirani,et al. Sample classification from protein mass spectrometry, by 'peak probability contrasts' , 2004, Bioinform..

[2] Andrew Y. Ng,et al. On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples , 1998, ICML.

[3] E. Petricoin,et al. Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.