Empirical comparison of scoring rules at early stages of CAT