Instructor
David Meger
david.meger@X
Office: McConnell 112N
(do not come there physically, but remember the real world!)
Office Hours: by appointment (booking information on My Courses internal site)
X = mcgill.ca

TA
Xiru Zhu
xiru.zhu@X
Office Hours: by appointment (booking information on My Courses internal site)
X = mcgill.ca

News

Jan 7, 2021 The first lecture will happen today on Zoom at [this link]. We will only give general background for the course and topic area and nothing examinable will be covered. This is to allow folks time to find the course without missing out (or extend their much-needed holiday!)

Overview

COMP 765 is a research seminar designed to teach the background necessary for conducting research in learning, adaptation, and deliberation for robot-like systems. Being located in Computer Science, our focus is on the design of algorithms and their properties, but this is an area of CS that must touch the real world - so a healthy dose of background on mechanisms, sensors and their physics will be mixed in. In 2021, McGill's Reinforcement Learning course will not be offered, so COMP 765 is re-titled to RL for Robotics, giving students the chance to replace some of the RL exposure they are looking for. We will not cover RL theory as deeply (or broadly), but 765 will give sufficient background, especially in Deep RL for continuous control, that a student can easily begin research after this course.

We will broadly cover the following areas:

Schedule

(As we go, this will be populated with the slides, required reading links, and information)
Week Topics Slides References
Jan 7 Introduction
Overview of ideas in the course.
1 - Introduction
Jan 12 MDP Formulations and Solutions to Linear Systems
Defining MDP terminology and related problems. Linear quadratic regulators
2 - MDPs, Value functions and Roadmap
3 - LQR and Trajectory Optimization
RL book Chapters 1 and 3.
Stephen Boyd's LQR notes and examples
Peter Abeel's LQR lecture
Jan 19 Discrete Policy Evaluation and Improvement
Discrete approximations of robots. Tabular solutions to prediction, value and policy iteration.
4 - Policy Eval
5 - Policy Improvement
RL Book Chapters 3 and 4
Jan 26 Reinforcement Learning from experience on discrete problems
Monte-Carlo and Temporal Difference Methods. Sarsa. Q Learning.
6 - Online RL from data RL Book Chapters 5, 6 and 13
Feb 2 Continuous state-action RL and Function Approximation
Policy Gradient Theorem and Actor-Critic approaches. Scaling up policy improvment. Linear to deep FA. First research papers.
7 - Policy Gradients
8 - Function Approximation
RL Book Chapters 9, 10, and 11
Neural Fitted Q and Least-squares TD.
Feb 9 Model-based RL
Dyna. Optimal control with learning. Probabilistic dynamics learning and belief-space policy updates.
9 - Model-based RL
10 - GPs
RL Book Chapter 8
Dyna by Sutton
A GP Tutorial
A GP textbook
PILCO by Marc Deisenroth
Feb 16 Deep RL Student Presentations #1
Progress of methods from 2013-present. Connections of recent methods to theory.
Playing Atari with Deep RL
Rainbow: Combining Improvements in Deep Reinforcement Learning
Continuous control with deep reinforcement learning
Proximal Policy Optimization Algorithms
Feb 23 Deep RL Student Presentations #2
Progress of methods from 2013-present. Connections of recent methods to theory.
Addressing Function Approximation Error in Actor-Critic Methods
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
A Distributional Perspective on Reinforcement Learning
Maximum a Posteriori Policy Optimisation
March 2 Reading Week
No lectures - enjoy the rest!
March 9 Status of RL for Robotics
Latest accomplishments and challenges in research papers. Taxonomy of historical and current efforts. Examining demonstrations and benchmarks.
11 - Status of RL for Robotics RSS 2020 sim2real workshop
Lavalle text Ch1
Underactuated Robotics Section 1
March 16 Imitation Learning
Paradigms for imitation including LfD, IRL and AfP. Passively learning from experts and active demonstration-seeking.
12 - Cloning, Apprenticeship, Batch RL DAGGER
Apprenticeship Learning
Batch Constrained Q-Learning
March 23 Behavior generalization and transfer
Brief peak at robust control. Inverse-dynamics adaptation. Transfer via imitation. Meta and Multi-task RL. Sim2Real transfer.
13 - Simple transfer Policy Adjustment paper
T-Reilience paper
Cascaded GP Learning
Mutual-Alignment Transfer Learning
March 30 Learning from Robotic Vision
Scaling up our methods to visual inputs and real sensors. Visual foresight, multi-modal methods. Latent-space planning.
14 - Learning from Vision
April 6 Student presentations in state-of-the-art RL on robots Papers selected by students
April 13 Project Demonstrations and Wrap-Up

Assignments

Assignments will be due every 2 weeks and will appear here when distributed. They will mix pen-and-paper questions with short coding work.

Marking scheme

Recommended, but optional, textbooks

Related courses

Diversity and Inclusion

Robotics is one of the most imporant technologies in our world today and will be one of the most important skill-sets that people will use to influence the world in our lifetimes. This knowledge should be shared equally by all agents. Our goal is to make this content equally accessible to students of all backgrounds and we work to pro-actively acknowledge and address any bias that may occur during the term. Equal treatment of students from every gender, race and orientation is a top priority. We openly welcome suggestions on how to improve inclusion, by contacting the instructor either with your name or anonymously.

Disclaimers

McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offenses under the Code of Student Conduct and Disciplinary Procedures (see (this link) for more information). In accord with McGill University's Charter of Students' Rights, students in this course have the right to submit in English or in French any written work that is to be graded. In the event of extraordinary circumstances beyond the University's control, the content and/or evaluation scheme in this course is subject to change.