Boosted PRIM with application to searching for oncogenic pathway of lung cancer

Boosted PRIM (patient rule induction method) is a new algorithm developed for two-class classification problems. PRIM is a variation of those tree-based methods, seeking box-shaped regions in the feature space to separate different classes. Boosted PRIM is to implement PRIM-styled weak learners in Adaboost, one of the most popular boosting algorithms. In addition, we improve the performance of the algorithm by introducing a regularization to the boosting process, which supports the perspective of viewing boosting as a steepest-descent numerical optimization by Jerry Friedman. The motivation for boosted PRIM is to solve the problem of "searching for oncogenic pathways" based on array-CGH (comparative genomic hybridization) data, though the algorithm itself is suitable for general classification problems. We illustrate the performance of the method through some simulation studies as well as an application on a lung cancer array-CGH data set.