论文信息 - Interactive Learning for Sequential Decisions and Predictions

Interactive Learning for Sequential Decisions and Predictions

Sequential prediction problems arise commonly in many areas of robotics and information processing: e.g., predicting a sequence of actions over time to achieve a goal in a control task, interpreting an image through a sequence of local image patch classifications, or translating speech to text through an iterative decoding procedure. Learning predictors that can reliably perform such sequential tasks is challenging. Specifically, as predictions influence future inputs in the sequence, the datageneration process and executed predictor are inextricably intertwined. This can often lead to a significant mismatch between the distribution of examples observed during training (induced by the predictor used to generate training instances) and test executions (induced by the learned predictor). As a result, naively applying standard supervised learning methods – that assume independently and identically distributed training and test examples – often leads to poor test performance and compounding errors: inaccurate predictions lead to untrained situations where more errors are inevitable. This thesis proposes general iterative learning procedures that leverage interactions between the learner and teacher to provably learn good predictors for sequential prediction tasks. Through repeated interactions, our approaches can efficiently learn predictors that are robust to their own errors and predict accurately during test executions. Our main approach uses existing no-regret online learning methods to provide strong generalization guarantees on test performance. We demonstrate how to apply our main approach in various sequential prediction settings: imitation learning, model-free reinforcement learning, system identification, structured prediction and submodular list predictions. Its efficiency and wide applicability are exhibited over a large variety of challenging learning tasks, ranging from learning video game playing agents from human players and accurate dynamic models of a simulated helicopter for controller synthesis, to learning predictors for scene understanding in computer vision, news recommendation and document summarization. We also demonstrate the applicability of our technique on a real robot, using pilot demonstrations to train an autonomous quadrotor to avoid trees seen through its onboard camera (monocular vision) when flying at low-altitude in natural forest environments. Our results throughout show that unlike typical supervised learning tasks where examples of good behavior are sufficient to learn good predictors, interaction is a fundamental part of learning in sequential tasks. We show formally that some level of interaction is necessary, as without interaction, no learning algorithm can guarantee good performance in general.

Stephane Ross | Stéphane Ross

[1] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[2] R. E. Kalman,et al. Contributions to the Theory of Optimal Control , 1960 .

[3] Β. L. HO,et al. Editorial: Effective construction of linear state-variable models from input/output functions , 1966 .

[4] Karl Johan Åström,et al. Numerical Identification of Linear Dynamic Systems from Normal Operating Records , 1965 .

[5] A. G. Butkovskiy,et al. Optimal control of systems , 1966 .

[6] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .

[7] Karl Johan Åström,et al. BOOK REVIEW SYSTEM IDENTIFICATION , 1994, Econometric Theory.

[8] H. Akaike. Markovian Representation of Stochastic Processes by Canonical Variables , 1975 .

[9] M. L. Fisher,et al. An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[10] L. Ljung. Convergence analysis of parametric identification methods , 1978 .

[11] Y. Bar-Shalom. Stochastic dynamic programming: Caution and probing , 1981 .

[12] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[13] Jean-Claude Latombe,et al. An Approach to Automatic Robot Programming Based on Inductive Learning , 1984 .

[14] Lennart Ljung,et al. Optimal experiment designs with respect to the intended model application , 1986, Autom..

[15] Lennart Ljung,et al. System Identification: Theory for the User , 1987 .

[16] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[17] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[18] Balas K. Natarajan,et al. On learning sets and functions , 2004, Machine Learning.

[19] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[20] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[21] Vladimir Vovk,et al. Universal Forecasting Algorithms , 1992, Inf. Comput..

[22] David Haussler,et al. How to use expert advice , 1993, STOC.

[23] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[24] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[25] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[26] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[27] Vladimir Vapnik,et al. The Nature of Statistical Learning , 1995 .

[28] L. Ljung. Nonlinear Black Box Models in Systems Identification , 1997 .

[29] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[30] Lennart Ljung,et al. Nonlinear black-box modeling in system identification: a unified overview , 1995, Autom..

[31] B. Pasik-Duncan,et al. Adaptive Control , 1996, IEEE Control Systems.

[32] Håkan Hjalmarsson,et al. For model-based control design, closed-loop identification gives better performance , 1996, Autom..

[33] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.

[34] Robert Tibshirani,et al. Classification by Pairwise Coupling , 1997, NIPS.

[35] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[36] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[37] Olga Veksler,et al. Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[38] Lennart Ljung,et al. Closed-loop identification revisited , 1999, Autom..

[39] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[40] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[41] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[42] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[43] Lennart Ljung,et al. Some results on optimal experiment design , 2000, Autom..

[44] Ole Winther,et al. TAP Gibbs Free Energy, Belief Propagation and Sparsity , 2001, NIPS.

[45] M. Opper,et al. Advanced mean field methods: theory and practice , 2001 .

[46] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .

[47] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[48] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[49] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[50] Jonathan P. How,et al. Receding horizon control of autonomous aerial vehicles , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[51] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[52] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[53] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[54] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.

[55] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[56] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.

[57] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[58] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[59] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[60] Vladimir Kolmogorov,et al. What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61] Stuart J. Russell,et al. Probabilistic graphical models and algorithms for genomic analysis , 2004 .

[62] Emanuel Todorov,et al. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[63] Andrew W. Moore,et al. Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[64] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[65] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[66] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[67] Bianca Zadrozny,et al. Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[68] László Lovász,et al. Approximating Min Sum Set Cover , 2004, Algorithmica.

[69] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[70] R. Schapire,et al. Toward efficient agnostic learning , 1992, COLT '92.

[71] Michael L. Littman,et al. An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[72] Zijiang J. He,et al. Perceiving distance accurately by a directional process of integrating ground information , 2004, Nature.

[73] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[74] Larry Wasserman,et al. All of Statistics: A Concise Course in Statistical Inference , 2004 .

[75] E. R. Davies,et al. Machine vision - theory, algorithms, practicalities , 2004 .

[76] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[77] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[78] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[79] Thomas P. Hayes,et al. Error limiting reductions between classification tasks , 2005, ICML.

[80] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.

[81] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[82] Ashutosh Saxena,et al. High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[83] Ben Taskar,et al. Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[84] William W. Cohen,et al. Stacked Sequential Learning , 2005, IJCAI.

[85] Martial Hebert,et al. Exploiting Inference for Approximate Parameter Learning in Discriminative Fields: An Empirical Study , 2005, EMMCVPR.

[86] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[87] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[88] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[89] J. Felsenstein. Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[90] John Langford,et al. Sensitive Error Correcting Output Codes , 2005, COLT.

[91] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[92] Alexei A. Efros,et al. Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[93] Yan Liu. Conditional Graphical Models for Protein Structure Prediction , 2006 .

[94] Ian McGraw,et al. Residual Belief Propagation: Informed Scheduling for Asynchronous Message Passing , 2006, UAI.

[95] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[96] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[97] Bernhard Schölkopf,et al. Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[98] Robert E. Schapire,et al. Algorithms for portfolio management based on the Newton method , 2006, ICML.

[99] J. Andrew Bagnell,et al. Terrain Classification from Aerial Data to Support Ground Vehicle Navigation , 2006 .

[100] Anonymous Author. Robust Reductions from Ranking to Classification , 2006 .

[101] Eric P. Xing,et al. HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation , 2007, NIPS.

[102] Vahab S. Mirrokni,et al. Maximizing Non-Monotone Submodular Functions , 2011, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[103] Edward H. Adelson,et al. Learning Gaussian Conditional Random Fields for Low-Level Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[104] Fernando Pereira,et al. Structured Learning with Approximate Inference , 2007, NIPS.

[105] Nathan Ratliff,et al. Online) Subgradient Methods for Structured Prediction , 2007 .

[106] Sanjoy Dasgupta,et al. A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[107] Christopher Joseph Pal,et al. Learning Conditional Random Fields for Stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[108] Roberto Cipolla,et al. Assisted Video Object Labeling By Joint Tracking of Regions and Keypoints , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[109] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.

[110] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[111] Siddhartha S. Srinivasa,et al. Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[112] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[113] Sebastian Scherer,et al. Flying Fast and Low Among Obstacles , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[114] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[115] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[116] David Silver,et al. High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2008, Robotics: Science and Systems.

[117] Manfred K. Warmuth,et al. Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[118] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.

[119] Matthew J. Streeter,et al. An Online Algorithm for Maximizing Submodular Functions , 2008, NIPS.

[120] Ambuj Tewari,et al. On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[121] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[122] Thorsten Joachims,et al. Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[123] Sham M. Kakade,et al. Mind the Duality Gap: Logarithmic regret algorithms for online optimization , 2008, NIPS.

[124] Peter Stone,et al. Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.

[125] Thorsten Joachims,et al. Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[126] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[127] Kilian Q. Weinberger,et al. Feature hashing for large scale multitask learning , 2009, ICML '09.

[128] Julian Togelius,et al. Mario AI competition , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[129] John Langford,et al. Error-Correcting Tournaments , 2009, ALT.

[130] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[131] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[132] Siddhartha S. Srinivasa,et al. CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[133] Peter Stone,et al. Generalized model learning for reinforcement learning in factored domains , 2009, AAMAS.

[134] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[135] Martial Hebert,et al. Contextual classification with functional Max-Margin Markov Networks , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[136] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[137] Nicholas Roy,et al. Autonomous Flight in Unknown Indoor Environments , 2009 .

[138] Quoc V. Le,et al. Proximal regularization for online and batch learning , 2009, ICML '09.

[139] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[140] Thomas J. Walsh,et al. Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.

[141] Andreas Krause,et al. Online Learning of Assignments , 2009, NIPS.

[142] Nathan Ratliff,et al. Learning to search: structured prediction techniques for imitation learning , 2009 .

[143] John Langford,et al. Agnostic active learning , 2006, J. Comput. Syst. Sci..

[144] Manuela M. Veloso,et al. Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[145] Andrey Bernstein,et al. Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains , 2010, Machine Learning.

[146] Zhuowen Tu,et al. Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[147] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.

[148] Ambuj Tewari,et al. Composite objective mirror descent , 2010, COLT 2010.

[149] Martial Hebert,et al. Stacked Hierarchical Labeling , 2010, ECCV.

[150] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[151] Robert E. Schapire,et al. A Reduction from Apprenticeship Learning to Classification , 2010, NIPS.

[152] John Langford,et al. Agnostic Active Learning Without Constraints , 2010, NIPS.

[153] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[154] Byron Boots,et al. Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[155] Horst Bischof,et al. Motion estimation with non-local total variation regularization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[156] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[157] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.

[158] Hsuan-Tien Lin,et al. One-sided Support Vector Regression for Multiclass Cost-sensitive Classification , 2010, ICML.

[159] Hui Lin,et al. Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[160] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[161] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[162] David Silver,et al. Learning Preference Models for Autonomous Mobile Robots in Complex Domains , 2010 .

[163] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[164] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[165] Ashutosh Kumar Singh,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[166] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.

[167] Sanjiv Singh,et al. A cascaded method to detect aircraft in video imagery , 2011, Int. J. Robotics Res..

[168] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[169] J. Andrew Bagnell,et al. Stability Conditions for Online Learnability , 2011, ArXiv.

[170] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[171] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[172] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[173] Bart De Moor,et al. Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[174] Martial Hebert,et al. 3-D scene analysis via sequenced predictions over points and regions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[175] H. Brendan McMahan,et al. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.

[176] Yisong Yue,et al. Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[177] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[178] Ben Taskar,et al. Learning Determinantal Point Processes , 2011, UAI.

[179] Hui Lin,et al. A Class of Submodular Functions for Document Summarization , 2011, ACL.

[180] Byron Boots,et al. An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.

[181] Shie Mannor,et al. Decoupling Exploration and Exploitation in Multi-Armed Bandits , 2012, ICML.

[182] He He,et al. Imitation Learning by Coaching , 2012, NIPS.

[183] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.

[184] Hui Lin,et al. Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.

[185] Pushmeet Kohli,et al. Multiple Choice Learning: Learning to Produce Multiple Structured Outputs , 2012, NIPS.

[186] Nicholas Roy,et al. State estimation for aggressive flight in GPS-denied environments using onboard sensing , 2012, 2012 IEEE International Conference on Robotics and Automation.

[187] J. Andrew Bagnell,et al. Efficient Optimization of Control Libraries , 2011, AAAI.

[188] Albert S. Huang,et al. Estimation, planning, and mapping for autonomous flight using an RGB-D camera in GPS-denied environments , 2012, Int. J. Robotics Res..

[189] Martial Hebert,et al. Contextual Sequence Prediction with Application to Control Library Optimization , 2012, Robotics: Science and Systems.

[190] Thorsten Joachims,et al. Online learning to diversify from implicit feedback , 2012, KDD.

[191] Andreas Vlachos,et al. An investigation of imitation learning algorithms for structured prediction , 2012, EWRL.

[192] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[193] Frank Dellaert,et al. Saliency detection and model-based tracking: a two part vision system for small robot navigation in forested environment , 2012, Defense, Security, and Sensing.

[194] Martial Hebert,et al. Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[195] Martial Hebert,et al. Efficient 3-D scene analysis from streaming data , 2013, 2013 IEEE International Conference on Robotics and Automation.

[196] M. Hebert,et al. Efficient temporal consistency for streaming video scene analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[197] Yisong Yue,et al. Learning Policies for Contextual Submodular Prediction , 2013, ICML.

[198] Yisong Yue,et al. Knapsack Constrained Contextual Submodular List Prediction with Application to Multi-document Summarization , 2013, ArXiv.

[199] Sanjiv Kumar,et al. Discriminative Random Fields , 2006, International Journal of Computer Vision.