Feature Engineering and Classifier Ensemble for KDD Cup 2010

KDD Cup 2010 is an educational data mining competition. Participants are asked to learn a model from students' past behavior and then predict their future performance. At National Taiwan University, we organized a course for this competition. Most student sub-teams expanded features by various binarization and discretization techniques. The resulting sparse feature sets were trained by logistic regression (using LIBLINEAR). One sub-team considered condensed features using simple statistical techniques and applied Random Forest (through Weka) for training. Initial development was conducted on an internal split of training data for training and validation. We identied some useful feature combinations to improve performance. For the nal submission, we combined results of student sub-teams by regularized linear regression. Our team is the rst prize winner of both tracks (all teams and student teams) of KDD Cup 2010.