Optimizing Interactive Systems via Data-Driven Objectives
暂无分享,去创建一个
M. de Rijke | Artem Grotov | Ziming Li | Maarten de Rijke | Harrie Oosterhuis | Julia Kiseleva | A. Grotov | Julia Kiseleva | Harrie Oosterhuis | Ziming Li
[1] Stefan Ultes,et al. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.
[2] K. Arrow,et al. The New Palgrave Dictionary of Economics , 2020 .
[3] Jan Peters,et al. Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.
[4] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..
[5] Julie A. Shah,et al. Human-Machine Collaborative Optimization via Apprenticeship Scheduling , 2018, J. Artif. Intell. Res..
[6] Thorsten Joachims,et al. Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.
[7] T. Graepel,et al. Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.
[8] Michael C. Yip,et al. Adversarial Imitation via Variational Inverse Reinforcement Learning , 2018, ICLR.
[9] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.
[10] Stefan Ultes,et al. Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management , 2017, SIGDIAL Conference.
[11] Leif Azzopardi,et al. Modelling interaction with economic models of search , 2014, SIGIR.
[12] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.
[13] Kam-Fai Wong,et al. Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.
[15] Jianfeng Gao,et al. A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.
[16] Mounia Lalmas,et al. Absence time and user engagement: evaluating ranking functions , 2013, WSDM '13.
[17] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.
[18] Roberto Pieraccini,et al. A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..
[19] Hui Ye,et al. Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.
[20] Brian D. Ziebart,et al. Intent Prediction and Trajectory Forecasting via Predictive Inverse Linear-Quadratic Regulation , 2015, AAAI.
[21] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[22] Yi Pan,et al. Conversational AI: The Science Behind the Alexa Prize , 2018, ArXiv.
[23] Maarten de Rijke,et al. Dynamic Query Modeling for Related Content Finding , 2015, SIGIR.
[24] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[25] Jaana Kekäläinen,et al. Cumulated gain-based evaluation of IR techniques , 2002, TOIS.
[26] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[27] Stephen C. Adams,et al. Multi-agent Inverse Reinforcement Learning for Certain General-sum Stochastic Games , 2019, J. Artif. Intell. Res..
[28] Grace Hui Yang,et al. Session Search by Direct Policy Learning , 2015, ICTIR.
[29] Manuel Lopes,et al. Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.
[30] Victor Zue,et al. JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..
[31] Sergey Levine,et al. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.
[32] Madian Khabsa,et al. Detecting Good Abandonment in Mobile Search , 2016, WWW.
[33] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[34] Bing Liu,et al. Adversarial Learning of Task-Oriented Neural Dialog Models , 2018, SIGDIAL Conference.
[35] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[36] Marilyn A. Walker,et al. PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.
[37] Joelle Pineau,et al. Bootstrapping Dialog Systems with Word Embeddings , 2014 .
[38] Sergey Levine,et al. Feature Construction for Inverse Reinforcement Learning , 2010, NIPS.
[39] Imed Zitouni,et al. Understanding User Satisfaction with Intelligent Assistants , 2016, CHIIR.
[40] Liang Zhang,et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.
[41] Chen Cui,et al. User Attention-guided Multimodal Dialog Systems , 2019, SIGIR.
[42] M. de Rijke,et al. A I ] 8 M ay 2 01 8 Optimizing Interactive Systems with Data-Driven Objectives , 2018 .
[43] Xuanjing Huang,et al. Towards Diverse Text Generation with Inverse Reinforcement Learning , 2018, ArXiv.
[44] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[45] Jianfeng Gao,et al. End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.
[46] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[47] Hang Li,et al. Neural Responding Machine for Short-Text Conversation , 2015, ACL.
[48] Anca D. Dragan,et al. Learning from Extrapolated Corrections , 2019, 2019 International Conference on Robotics and Automation (ICRA).
[49] J. Perner,et al. Development of theory of mind and executive control , 1999, Trends in Cognitive Sciences.
[50] Katja Hofmann,et al. Reusing historical interaction data for faster online learning to rank for IR , 2013, DIR.
[51] Alan Ritter,et al. Unsupervised Modeling of Twitter Conversations , 2010, NAACL.
[52] Jiliang Tang,et al. A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.
[53] Kam-Fai Wong,et al. Integrating planning for task-completion dialogue policy learning , 2018, ACL.
[54] Jianfeng Gao,et al. Guided Dialog Policy Learning without Adversarial Learning in the Loop , 2020, EMNLP.
[55] Rafael E. Banchs. Movie-DiC: a Movie Dialogue Corpus for Research and Development , 2012, ACL.
[56] Jianfeng Gao,et al. Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning , 2018, EMNLP.
[57] Geoffrey Zweig,et al. Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.
[58] Sergey Levine,et al. Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.
[59] Eelco Herder,et al. Web page revisitation revisited: implications of a long-term click-stream study of browser usage , 2007, CHI.
[60] Jacob Cohen,et al. The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .
[61] Jianfeng Gao,et al. Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.
[62] W. Bruce Croft,et al. Neural Ranking Models with Weak Supervision , 2017, SIGIR.
[63] M. de Rijke,et al. The Impact of Linkage Methods in Hierarchical Clustering for Active Learning to Rank , 2017, SIGIR.
[64] Richard S. Zemel,et al. SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies , 2019, NeurIPS.
[65] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[66] David Vandyke,et al. Policy committee for adaptation in multi-domain spoken dialogue systems , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[67] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.
[68] M. de Rijke,et al. Evaluating Personal Assistants on Mobile devices , 2017, ArXiv.
[69] Olivier Pietquin,et al. Inverse reinforcement learning for interactive systems , 2013, MLIS '13.
[70] Anca D. Dragan,et al. Reward-rational (implicit) choice: A unifying formalism for reward learning , 2020, NeurIPS.
[71] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[72] Ed H. Chi,et al. Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.
[73] Milica Gasic,et al. Gaussian Processes for POMDP-Based Dialogue Manager Optimization , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[74] Kam-Fai Wong,et al. Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning , 2017, EMNLP.
[75] Diane Kelly. When Effort Exceeds Expectations: A Theory of Search Task Difficulty (keynote) , 2015, SCST@ECIR.
[76] Joelle Pineau,et al. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.
[77] Victor Zue,et al. Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.
[78] Jianfeng Gao,et al. BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.
[79] Jianfeng Gao,et al. deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets , 2015, ACL.
[80] M. de Rijke,et al. Conversations with Documents: An Exploration of Document-Centered Assistance , 2020, CHIIR.
[81] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.
[82] Ryen W. White,et al. Evaluating implicit feedback models using searcher simulations , 2005, TOIS.
[83] Madian Khabsa,et al. Is This Your Final Answer?: Evaluating the Effect of Answers on Good Abandonment in Mobile Search , 2016, SIGIR.
[84] Frank Lovett. Rational Choice Theory and Explanation , 2006 .
[85] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[86] Wei-Ying Ma,et al. Topic Aware Neural Response Generation , 2016, AAAI.
[87] Markus Wulfmeier,et al. Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.
[88] Thorsten Joachims,et al. Shaping Feedback Data in Recommender Systems with Interventions Based on Information Foraging Theory , 2019, WSDM.
[89] Diane Kelly,et al. Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..
[90] Alan Ritter,et al. Data-Driven Response Generation in Social Media , 2011, EMNLP.
[91] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[92] M. de Rijke,et al. Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning , 2018, AAAI.
[93] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[94] Minlie Huang,et al. Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog , 2019, EMNLP.
[95] Hal R. Varian,et al. Economics and search , 1999, SIGF.
[96] Grace Hui Yang,et al. Learning to Reinforce Search Effectiveness , 2015, ICTIR.
[97] Katja Hofmann,et al. Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .
[98] Alan Ritter,et al. Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.
[99] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[100] Guy Shani,et al. An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..
[101] Sungjin Lee,et al. ConvLab: Multi-Domain End-to-End Dialog System Platform , 2019, ACL.
[102] Filip Radlinski,et al. Relevance and Effort: An Analysis of Document Utility , 2014, CIKM.
[103] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[104] Eric Crestan,et al. Modelling and Detecting Changes in User Satisfaction , 2014, CIKM.
[105] Nicholas Jing Yuan,et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.
[106] Anind K. Dey,et al. Probabilistic pointing target prediction via inverse optimal control , 2012, IUI '12.
[107] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.
[108] Imed Zitouni,et al. Predicting User Satisfaction with Intelligent Assistants , 2016, SIGIR.
[109] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[110] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[111] Gleb Gusev,et al. Engagement Periodicity in Search Engine Usage: Analysis and its Application to Search Quality Evaluation , 2015, WSDM.
[112] Antoine Raux,et al. The Dialog State Tracking Challenge Series , 2014, AI Mag..
[113] Alexander I. Rudnicky,et al. A Wizard-of-Oz Study on A Non-Task-Oriented Dialog Systems That Reacts to User Engagement , 2016, SIGDIAL Conference.
[114] Alexander I. Rudnicky,et al. Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.
[115] Joelle Pineau,et al. Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.
[116] Jacob W. Crandall,et al. Towards Minimizing Disappointment in Repeated Games , 2014, J. Artif. Intell. Res..
[117] Steve Fox,et al. Evaluating implicit measures to improve web search , 2005, TOIS.
[118] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[119] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[120] Jianfeng Gao,et al. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.
[121] Ferran Argelaguet,et al. The role of interaction in virtual embodiment: Effects of the virtual hand representation , 2016, 2016 IEEE Virtual Reality (VR).
[122] Katja Hofmann,et al. Balancing Exploration and Exploitation in Learning to Rank Online , 2011, ECIR.
[123] Anca D. Dragan,et al. Learning Human Objectives by Evaluating Hypothetical Behavior , 2019, ICML.
[124] Jianfeng Gao,et al. A User Simulator for Task-Completion Dialogues , 2016, ArXiv.
[125] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[126] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[127] Vasile Rus,et al. A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, BEA@NAACL-HLT.
[128] Harry Shum,et al. From Eliza to XiaoIce: challenges and opportunities with social chatbots , 2018, Frontiers of Information Technology & Electronic Engineering.
[129] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[130] Shane Legg,et al. Scalable agent alignment via reward modeling: a research direction , 2018, ArXiv.
[131] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[132] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[133] M. de Rijke,et al. A Neural Click Model for Web Search , 2016, WWW.
[134] Nicholas Jing Yuan,et al. Beyond the Words: Predicting User Personality from Heterogeneous Information , 2017, WSDM.
[135] Jaime Teevan,et al. How people recall, recognize, and reuse search results , 2008, ACM Trans. Inf. Syst..
[136] Joseph Weizenbaum,et al. ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.
[137] E. Gumbel. Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .
[138] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[139] David Vandyke,et al. On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.
[140] Ryen W. White,et al. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes , 2002, SIGIR '02.
[141] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[142] M. de Rijke,et al. Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems , 2020, FINDINGS.
[143] Ryen W. White. Interactions with Search Systems , 2016 .
[144] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.
[145] Paul B. Kantor,et al. A study of information seeking and retrieving. II. Users, questions, and effectiveness , 1988 .
[146] Jianfeng Gao,et al. A Persona-Based Neural Conversation Model , 2016, ACL.
[147] M. de Rijke,et al. Towards Learning Reward Functions from User Interactions , 2017, ICTIR.
[148] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[149] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[150] Matthew Lease,et al. Correlation and Prediction of Evaluation Metrics in Information Retrieval , 2018, ArXiv.
[151] Paul B. Kantor,et al. A study of information seeking and retrieving. I. background and methodology , 1988 .
[152] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[153] Carlo Tomasi,et al. Distance Minimization for Reward Learning from Scored Trajectories , 2016, AAAI.
[154] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[155] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[156] Wei Chu,et al. Cohort modeling for enhanced personalized search , 2014, SIGIR.
[157] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[158] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.
[159] Tefko Saracevic,et al. RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..
[160] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[161] J. Mitchell,et al. Dynamic versus static menus: an exploratory comparison , 1989, SGCH.
[162] Filip Radlinski,et al. Preference elicitation as an optimization problem , 2018, RecSys.
[163] Steve J. Young,et al. Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..
[164] David Silver,et al. Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.
[165] Romain Laroche,et al. Reward Shaping for Statistical Optimisation of Dialogue Management , 2013, SLSP.
[166] Jaap Kamps,et al. Behavioral Dynamics from the SERP's Perspective: What are Failed SERPs and How to Fix Them? , 2015, CIKM.
[167] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.