Interactive Learning for Sequential Decisions and Predictions
暂无分享,去创建一个
[1] D. Horvitz,et al. A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .
[2] R. E. Kalman,et al. Contributions to the Theory of Optimal Control , 1960 .
[3] Β. L. HO,et al. Editorial: Effective construction of linear state-variable models from input/output functions , 1966 .
[4] Karl Johan Åström,et al. Numerical Identification of Linear Dynamic Systems from Normal Operating Records , 1965 .
[5] A. G. Butkovskiy,et al. Optimal control of systems , 1966 .
[6] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .
[7] Karl Johan Åström,et al. BOOK REVIEW SYSTEM IDENTIFICATION , 1994, Econometric Theory.
[8] H. Akaike. Markovian Representation of Stochastic Processes by Canonical Variables , 1975 .
[9] M. L. Fisher,et al. An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..
[10] L. Ljung. Convergence analysis of parametric identification methods , 1978 .
[11] Y. Bar-Shalom. Stochastic dynamic programming: Caution and probing , 1981 .
[12] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[13] Jean-Claude Latombe,et al. An Approach to Automatic Robot Programming Based on Inductive Learning , 1984 .
[14] Lennart Ljung,et al. Optimal experiment designs with respect to the intended model application , 1986, Autom..
[15] Lennart Ljung,et al. System Identification: Theory for the User , 1987 .
[16] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.
[17] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[18] Balas K. Natarajan,et al. On learning sets and functions , 2004, Machine Learning.
[19] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[20] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[21] Vladimir Vovk,et al. Universal Forecasting Algorithms , 1992, Inf. Comput..
[22] David Haussler,et al. How to use expert advice , 1993, STOC.
[23] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.
[24] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[25] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[26] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[27] Vladimir Vapnik,et al. The Nature of Statistical Learning , 1995 .
[28] L. Ljung. Nonlinear Black Box Models in Systems Identification , 1997 .
[29] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[30] Lennart Ljung,et al. Nonlinear black-box modeling in system identification: a unified overview , 1995, Autom..
[31] B. Pasik-Duncan,et al. Adaptive Control , 1996, IEEE Control Systems.
[32] Håkan Hjalmarsson,et al. For model-based control design, closed-loop identification gives better performance , 1996, Autom..
[33] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.
[34] Robert Tibshirani,et al. Classification by Pairwise Coupling , 1997, NIPS.
[35] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[36] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.
[37] Olga Veksler,et al. Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.
[38] Lennart Ljung,et al. Closed-loop identification revisited , 1999, Autom..
[39] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[40] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.
[41] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[42] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[43] Lennart Ljung,et al. Some results on optimal experiment design , 2000, Autom..
[44] Ole Winther,et al. TAP Gibbs Free Energy, Belief Propagation and Sparsity , 2001, NIPS.
[45] M. Opper,et al. Advanced mean field methods: theory and practice , 2001 .
[46] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[47] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[48] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[49] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[50] Jonathan P. How,et al. Receding horizon control of autonomous aerial vehicles , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).
[51] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[52] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.
[53] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[54] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[55] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[56] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[57] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[58] H. Sebastian Seung,et al. Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.
[59] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[60] Vladimir Kolmogorov,et al. What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[61] Stuart J. Russell,et al. Probabilistic graphical models and algorithms for genomic analysis , 2004 .
[62] Emanuel Todorov,et al. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.
[63] Andrew W. Moore,et al. Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.
[64] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[65] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[66] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[67] Bianca Zadrozny,et al. Learning and evaluating classifiers under sample selection bias , 2004, ICML.
[68] László Lovász,et al. Approximating Min Sum Set Cover , 2004, Algorithmica.
[69] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.
[70] R. Schapire,et al. Toward efficient agnostic learning , 1992, COLT '92.
[71] Michael L. Littman,et al. An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.
[72] Zijiang J. He,et al. Perceiving distance accurately by a directional process of integrating ground information , 2004, Nature.
[73] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[74] Larry Wasserman,et al. All of Statistics: A Concise Course in Statistical Inference , 2004 .
[75] E. R. Davies,et al. Machine vision - theory, algorithms, practicalities , 2004 .
[76] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[77] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[78] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[79] Thomas P. Hayes,et al. Error limiting reductions between classification tasks , 2005, ICML.
[80] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[81] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[82] Ashutosh Saxena,et al. High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.
[83] Ben Taskar,et al. Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[84] William W. Cohen,et al. Stacked Sequential Learning , 2005, IJCAI.
[85] Martial Hebert,et al. Exploiting Inference for Approximate Parameter Learning in Discriminative Fields: An Empirical Study , 2005, EMMCVPR.
[86] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[87] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.
[88] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..
[89] J. Felsenstein. Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.
[90] John Langford,et al. Sensitive Error Correcting Output Codes , 2005, COLT.
[91] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[92] Alexei A. Efros,et al. Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.
[93] Yan Liu. Conditional Graphical Models for Protein Structure Prediction , 2006 .
[94] Ian McGraw,et al. Residual Belief Propagation: Informed Scheduling for Asynchronous Message Passing , 2006, UAI.
[95] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[96] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.
[97] Bernhard Schölkopf,et al. Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.
[98] Robert E. Schapire,et al. Algorithms for portfolio management based on the Newton method , 2006, ICML.
[99] J. Andrew Bagnell,et al. Terrain Classification from Aerial Data to Support Ground Vehicle Navigation , 2006 .
[100] Anonymous Author. Robust Reductions from Ranking to Classification , 2006 .
[101] Eric P. Xing,et al. HM-BiTAM: Bilingual Topic Exploration, Word Alignment, and Translation , 2007, NIPS.
[102] Vahab S. Mirrokni,et al. Maximizing Non-Monotone Submodular Functions , 2011, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).
[103] Edward H. Adelson,et al. Learning Gaussian Conditional Random Fields for Low-Level Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[104] Fernando Pereira,et al. Structured Learning with Approximate Inference , 2007, NIPS.
[105] Nathan Ratliff,et al. Online) Subgradient Methods for Structured Prediction , 2007 .
[106] Sanjoy Dasgupta,et al. A General Agnostic Active Learning Algorithm , 2007, ISAIM.
[107] Christopher Joseph Pal,et al. Learning Conditional Random Fields for Stereo , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[108] Roberto Cipolla,et al. Assisted Video Object Labeling By Joint Tracking of Regions and Keypoints , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[109] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[110] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[111] Siddhartha S. Srinivasa,et al. Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.
[112] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[113] Sebastian Scherer,et al. Flying Fast and Low Among Obstacles , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.
[114] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[115] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[116] David Silver,et al. High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2008, Robotics: Science and Systems.
[117] Manfred K. Warmuth,et al. Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .
[118] Joelle Pineau,et al. Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.
[119] Matthew J. Streeter,et al. An Online Algorithm for Maximizing Submodular Functions , 2008, NIPS.
[120] Ambuj Tewari,et al. On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.
[121] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.
[122] Thorsten Joachims,et al. Training structural SVMs when exact inference is intractable , 2008, ICML '08.
[123] Sham M. Kakade,et al. Mind the Duality Gap: Logarithmic regret algorithms for online optimization , 2008, NIPS.
[124] Peter Stone,et al. Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.
[125] Thorsten Joachims,et al. Predicting diverse subsets using structural SVMs , 2008, ICML '08.
[126] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[127] Kilian Q. Weinberger,et al. Feature hashing for large scale multitask learning , 2009, ICML '09.
[128] Julian Togelius,et al. Mario AI competition , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.
[129] John Langford,et al. Error-Correcting Tournaments , 2009, ALT.
[130] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..
[131] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..
[132] Siddhartha S. Srinivasa,et al. CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.
[133] Peter Stone,et al. Generalized model learning for reinforcement learning in factored domains , 2009, AAMAS.
[134] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.
[135] Martial Hebert,et al. Contextual classification with functional Max-Margin Markov Networks , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[136] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[137] Nicholas Roy,et al. Autonomous Flight in Unknown Indoor Environments , 2009 .
[138] Quoc V. Le,et al. Proximal regularization for online and batch learning , 2009, ICML '09.
[139] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[140] Thomas J. Walsh,et al. Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.
[141] Andreas Krause,et al. Online Learning of Assignments , 2009, NIPS.
[142] Nathan Ratliff,et al. Learning to search: structured prediction techniques for imitation learning , 2009 .
[143] John Langford,et al. Agnostic active learning , 2006, J. Comput. Syst. Sci..
[144] Manuela M. Veloso,et al. Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..
[145] Andrey Bernstein,et al. Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains , 2010, Machine Learning.
[146] Zhuowen Tu,et al. Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[147] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[148] Ambuj Tewari,et al. Composite objective mirror descent , 2010, COLT 2010.
[149] Martial Hebert,et al. Stacked Hierarchical Labeling , 2010, ECCV.
[150] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[151] Robert E. Schapire,et al. A Reduction from Apprenticeship Learning to Classification , 2010, NIPS.
[152] John Langford,et al. Agnostic Active Learning Without Constraints , 2010, NIPS.
[153] Byron Boots,et al. Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..
[154] Byron Boots,et al. Reduced-Rank Hidden Markov Models , 2009, AISTATS.
[155] Horst Bischof,et al. Motion estimation with non-local total variation regularization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[156] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[157] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[158] Hsuan-Tien Lin,et al. One-sided Support Vector Regression for Multiclass Cost-sensitive Classification , 2010, ICML.
[159] Hui Lin,et al. Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.
[160] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[161] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.
[162] David Silver,et al. Learning Preference Models for Autonomous Mobile Robots in Complex Domains , 2010 .
[163] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..
[164] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[165] Ashutosh Kumar Singh,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .
[166] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[167] Sanjiv Singh,et al. A cascaded method to detect aircraft in video imagery , 2011, Int. J. Robotics Res..
[168] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..
[169] J. Andrew Bagnell,et al. Stability Conditions for Online Learnability , 2011, ArXiv.
[170] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[171] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[172] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[173] Bart De Moor,et al. Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .
[174] Martial Hebert,et al. 3-D scene analysis via sequenced predictions over points and regions , 2011, 2011 IEEE International Conference on Robotics and Automation.
[175] H. Brendan McMahan,et al. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.
[176] Yisong Yue,et al. Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.
[177] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[178] Ben Taskar,et al. Learning Determinantal Point Processes , 2011, UAI.
[179] Hui Lin,et al. A Class of Submodular Functions for Document Summarization , 2011, ACL.
[180] Byron Boots,et al. An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.
[181] Shie Mannor,et al. Decoupling Exploration and Exploitation in Multi-Armed Bandits , 2012, ICML.
[182] He He,et al. Imitation Learning by Coaching , 2012, NIPS.
[183] J. Andrew Bagnell,et al. Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.
[184] Hui Lin,et al. Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.
[185] Pushmeet Kohli,et al. Multiple Choice Learning: Learning to Produce Multiple Structured Outputs , 2012, NIPS.
[186] Nicholas Roy,et al. State estimation for aggressive flight in GPS-denied environments using onboard sensing , 2012, 2012 IEEE International Conference on Robotics and Automation.
[187] J. Andrew Bagnell,et al. Efficient Optimization of Control Libraries , 2011, AAAI.
[188] Albert S. Huang,et al. Estimation, planning, and mapping for autonomous flight using an RGB-D camera in GPS-denied environments , 2012, Int. J. Robotics Res..
[189] Martial Hebert,et al. Contextual Sequence Prediction with Application to Control Library Optimization , 2012, Robotics: Science and Systems.
[190] Thorsten Joachims,et al. Online learning to diversify from implicit feedback , 2012, KDD.
[191] Andreas Vlachos,et al. An investigation of imitation learning algorithms for structured prediction , 2012, EWRL.
[192] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..
[193] Frank Dellaert,et al. Saliency detection and model-based tracking: a two part vision system for small robot navigation in forested environment , 2012, Defense, Security, and Sensing.
[194] Martial Hebert,et al. Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.
[195] Martial Hebert,et al. Efficient 3-D scene analysis from streaming data , 2013, 2013 IEEE International Conference on Robotics and Automation.
[196] M. Hebert,et al. Efficient temporal consistency for streaming video scene analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.
[197] Yisong Yue,et al. Learning Policies for Contextual Submodular Prediction , 2013, ICML.
[198] Yisong Yue,et al. Knapsack Constrained Contextual Submodular List Prediction with Application to Multi-document Summarization , 2013, ArXiv.
[199] Sanjiv Kumar,et al. Discriminative Random Fields , 2006, International Journal of Computer Vision.