Classification of Tutor System Logs with High Categorical Features

In this paper we propose our method for solving KDD Cup 2010 problem. Basically we did not perform a thorough literature review and reinvent all the ideas from scratch. The problem is predicting students learning based on logs of tutor systems which includes very large number of instances. In the preprocessing stage we deleted features not present in the test dataset and created some features. Transforming categorical features into numeric ones was another preprocessing step we performed. We used very naive sampling to deal with large number of instances. Despite of using only 3 features of 22 features and regular decision tree and regression algorithms, results are acceptable. Even though we have used so many simplifications, did not consider a lot of interrelationships among features and did not use the whole training data, our team, Y10, has reached the 4 th student place and 15 th rank overall.