An analysis of data sets used to train and validate cost prediction systems