COMP 765: Reinforcement Learning for Robotics, Winter 2021
Instructor
david.meger@X
Office: McConnell 112N
(do not come there physically, but remember the real world!)
Office Hours: by appointment (booking information on My Courses internal site)
X = mcgill.ca
TA
Xiru Zhu
xiru.zhu@X
Office Hours: by appointment (booking information on My Courses internal site)
X = mcgill.ca
News
Jan 7, 2021 The first lecture will happen today on Zoom at [this link]. We will only give general background for the course and topic area and nothing examinable will be covered. This is to allow folks time to find the course without missing out (or extend their much-needed holiday!)Overview
COMP 765 is a research seminar designed to teach the background necessary for conducting research in learning, adaptation, and deliberation for robot-like systems. Being located in Computer Science, our focus is on the design of algorithms and their properties, but this is an area of CS that must touch the real world - so a healthy dose of background on mechanisms, sensors and their physics will be mixed in. In 2021, McGill's Reinforcement Learning course will not be offered, so COMP 765 is re-titled to RL for Robotics, giving students the chance to replace some of the RL exposure they are looking for. We will not cover RL theory as deeply (or broadly), but 765 will give sufficient background, especially in Deep RL for continuous control, that a student can easily begin research after this course.We will broadly cover the following areas:
- Background on Optimal Decision Making for Robotics: types of robots, their sensors and actuators, planning and control problems and applications, operations research, human development.
- Finite MDPs: Bellman equations, policy evaluation, on and off-policy methods, value and policy iteration, exploration, temporal difference methods, Q-learning.
- RL with function approximation: policy gradients, actor critic, deep Q-learning, deterministic policy gradients, Batch RL.
- Decision Making Under Uncertainty: learning and control with probabilistic models, time-series probabilistic inference, maximum entropy methods, distributional RL.
- Transfer and Meta-learning: generalization of learned behaviors, policy adaptation, sim2real methods.
- Additional topics to be finalized based on class interest: imitation learning, inverse RL and inverse optimal control, multi-agent team coordination, human-robot interaction, model-based planning, latent-space methods, hierarchical decision making.
Schedule
(As we go, this will be populated with the slides, required reading links, and information)Week | Topics | Slides | References |
---|---|---|---|
Jan 7 |
Introduction Overview of ideas in the course. |
1 - Introduction | |
Jan 12 |
MDP Formulations and Solutions to Linear Systems Defining MDP terminology and related problems. Linear quadratic regulators |
2 - MDPs, Value functions and Roadmap 3 - LQR and Trajectory Optimization |
RL book Chapters 1 and 3. Stephen Boyd's LQR notes and examples Peter Abeel's LQR lecture |
Jan 19 |
Discrete Policy Evaluation and Improvement Discrete approximations of robots. Tabular solutions to prediction, value and policy iteration. |
4 - Policy Eval 5 - Policy Improvement |
RL Book Chapters 3 and 4 |
Jan 26 |
Reinforcement Learning from experience on discrete problems Monte-Carlo and Temporal Difference Methods. Sarsa. Q Learning. |
6 - Online RL from data | RL Book Chapters 5, 6 and 13 |
Feb 2 |
Continuous state-action RL and Function Approximation Policy Gradient Theorem and Actor-Critic approaches. Scaling up policy improvment. Linear to deep FA. First research papers. |
7 - Policy Gradients 8 - Function Approximation |
RL Book Chapters 9, 10, and 11 Neural Fitted Q and Least-squares TD. |
Feb 9 | Model-based RL Dyna. Optimal control with learning. Probabilistic dynamics learning and belief-space policy updates. |
9 - Model-based RL 10 - GPs |
RL Book Chapter 8 Dyna by Sutton A GP Tutorial A GP textbook PILCO by Marc Deisenroth |
Feb 16 | Deep RL Student Presentations #1 Progress of methods from 2013-present. Connections of recent methods to theory. |
Playing Atari with Deep RL Rainbow: Combining Improvements in Deep Reinforcement Learning Continuous control with deep reinforcement learning Proximal Policy Optimization Algorithms |
|
Feb 23 | Deep RL Student Presentations #2 Progress of methods from 2013-present. Connections of recent methods to theory. |
Addressing Function Approximation Error in Actor-Critic Methods Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor A Distributional Perspective on Reinforcement Learning Maximum a Posteriori Policy Optimisation |
|
March 2 | Reading Week No lectures - enjoy the rest! |
||
March 9 | Status of RL for Robotics Latest accomplishments and challenges in research papers. Taxonomy of historical and current efforts. Examining demonstrations and benchmarks. |
11 - Status of RL for Robotics |
RSS 2020 sim2real workshop Lavalle text Ch1 Underactuated Robotics Section 1 |
March 16 |
Imitation Learning Paradigms for imitation including LfD, IRL and AfP. Passively learning from experts and active demonstration-seeking. |
12 - Cloning, Apprenticeship, Batch RL | DAGGER Apprenticeship Learning Batch Constrained Q-Learning |
March 23 |
Behavior generalization and transfer Brief peak at robust control. Inverse-dynamics adaptation. Transfer via imitation. Meta and Multi-task RL. Sim2Real transfer. |
13 - Simple transfer | Policy Adjustment paper T-Reilience paper Cascaded GP Learning Mutual-Alignment Transfer Learning |
March 30 |
Learning from Robotic Vision Scaling up our methods to visual inputs and real sensors. Visual foresight, multi-modal methods. Latent-space planning. |
14 - Learning from Vision | |
April 6 | Student presentations in state-of-the-art RL on robots Papers selected by students | ||
April 13 | Project Demonstrations and Wrap-Up |
Assignments
Assignments will be due every 2 weeks and will appear here when distributed. They will mix pen-and-paper questions with short coding work.Marking scheme
- Assignments = 40%
- 1 in-class presentation on a technical topic = 10%
- 1 in-class research paper presentation = 10%
- 1 substantial final research project: 40% total
- 2-page written project proposal (mid-way through term): 5%
- Oral project presentation/demo (last week of classes): 10%
- Final project report (end of exam period): 25%