Building Document Treatment Chains Using Reinforcement Learning and Intuitive Feedback

We model a document treatment chain as a Markov Decision Process, and use reinforcement learning to allow the agent to learn to construct and continuously improve custom-made chains "on the fly". We build a platform which enables us to measure the impact on the learning of various models, web services, algorithms, parameters, etc. We apply this in an industrial setting, specifically to an open source document treatment chain which extracts events from massive volumes of web pages and other open-source documents. Our emphasis is on minimising the burden of the human analysts, from whom the agent learns to improve guided by their feedback on the events extracted. For this, we investigate different types of feedback, from numerical feedback, which requires a lot of tuning, to partially and even fully qualitative feedback, which is much more intuitive, and demands little to no user calibration. We carry out experiments, first with numerical feedback, then demonstrate that intuitive feedback still allows the agent to learn effectively.

[1]  Véronique Malaisé,et al.  Design and use of the Simple Event Model (SEM) , 2011, J. Web Semant..

[2]  Camélia Constantin,et al.  WebLab PROV: computing fine-grained provenance links for XML artifacts , 2013, EDBT '13.

[3]  Camélia Constantin,et al.  Provenance-Based Quality Assessment and Inference in Data-Centric Workflow Executions , 2014, OTM Conferences.

[4]  Amos Tversky,et al.  Studies of similarity , 1978 .

[5]  Nuno Oliveira,et al.  Towards an engine for coordination-based architectural reconfigurations , 2015, Comput. Sci. Inf. Syst..

[6]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[7]  William W. Cohen,et al.  Very Fast Similarity Queries on Semi-Structured Data from the Web , 2013, SDM.

[8]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[9]  Amos Azaria,et al.  Strategic advice provision in repeated human-agent interactions , 2012, Autonomous Agents and Multi-Agent Systems.

[10]  Eyke Hüllermeier,et al.  Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm , 2014, Machine Learning.

[11]  Gary LaFree,et al.  The Global Terrorism Database: Accomplishments and Challenges , 2010 .

[12]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13]  Paolo Viappiani,et al.  Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities , 2016, UAI.

[14]  Peter Stone,et al.  Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance , 2015, Artif. Intell..

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Laurie Serrano,et al.  Vers une capitalisation des connaissances orientée utilisateur : extraction et structuration automatiques de l'information issue de sources ouvertes. (Towards a user-oriented knowledge capitalization: automatic extraction and structuring of information from open sources) , 2014 .

[17]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[18]  Ivan Bratko,et al.  Learning Qualitative Models , 2004, AI Mag..

[19]  Karim Sehaba,et al.  Apprentissage de connaissances d'adaptation à partir des feedbacks des utilisateurs , 2014, IC.

[20]  S. Pandit,et al.  A Comparative Study on Distance Measuring Approaches for Clustering , 2011 .

[21]  Alan Fern,et al.  A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.

[22]  Michèle Sebag,et al.  Preference-Based Policy Learning , 2011, ECML/PKDD.

[23]  P. Fishburn SSB Utility theory: an economic perspective , 1984 .

[24]  Maciej Falkowski,et al.  Knowledge-based Highly-specialized Terrorist Event Extraction , 2013, RuleML.

[25]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[26]  Daniel G. Bobrow,et al.  Model-Based Computing for Design and Control of Reconfigurable Systems , 2004, AI Mag..

[27]  Michèle Sebag,et al.  APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.

[28]  Patrick Giroux,et al.  Méthodologie pour l'orchestration sémantique de services dans le domaine de la fouille de documents multimédia , 2009 .