LIMSI Submission for WMT'17 Shared Task on Bandit Learning

This paper describes LIMSI participation to the WMT’17 shared task on Bandit Learning. The method we propose to adapt a seed system trained on out-domain data to a new, unknown domain relies on two components. First, we use a linear regression model to exploit the weak and partial feedback the system receives by learning to predict the reward a translation hypothesis will get. This model can then be used to score hypotheses in the search space and translate source sentences while taking into account the specificities of the in-domain data. Second, we use the UCB1 algorithm to choose which of the ‘adapted’ or ‘seed’ system must be used to translate a given source sentence in order to maximize the cumulative reward. Results on the development and train sets show that the proposed method does not succeed in improving the seed system. We explore several hypotheses to explain this negative result.