Batch learning from logged bandit feedback through counterfactual risk minimization
暂无分享,去创建一个
We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, rec...