Model Reuse with Bike rental Station data (Preamble)
暂无分享,去创建一个
1 The challenge Adaptive reuse of learnt knowledge is of critical importance in the majority of knowledge-intensive application areas, particularly when the context in which the learnt model operates can be expected to vary from training to deployment. In the MoReBikeS challenge (Model Reuse with Bike Rental Station Data) that we organised as the ECML-PKDD 2015 Discovery Challenge #1, we decided to focus on model reuse and context change. In contrast to most of the machine learning challenges to date we provided the participants not only with data, but also with pre-trained models representing knowledge to be reused. The MoReBikeS challenge was carried out in the framework of historical bicycle rental data obtained from Valencia, Spain. Bicycles are continuously taken from and returned to rental stations across the city. Due to the patterns in demand some stations can become empty or full, such that more bikes cannot be rented or returned. To reduce the frequency of this happening, the rental company has to move bikes from full or nearly full stations to empty or nearly empty stations. This can be done more efficiently if the numbers of bikes in the stations are predicted some hours in advance. The quality of such predictions relies heavily on the recorded usage over long periods of time. Therefore, the prediction quality on newly opened stations is necessarily lower. In this challenge the participants were required to predict the number of bikes 3 hours in advance on 75 stations with a short (1 month) recorded history, by making use of the provided predictive linear models trained on other 200 stations with longer history (more than 2 years). To promote reuse of the provided models we did not reveal the long historical data to the participants (except for 10 stations out of 200). The predictions were evaluated using mean absolute error in the number of bikes every hour over the next period of 3 months. The common problem in evaluating predictors of time-series are the temporal dependencies: the features of later test time-points leak information about the earlier test time-points. We tackled this problem by setting the challenge in two stages: the leaderboard challenge and the full test challenge. In the leaderboard