McGill COMP 765

Instructor

David Meger

david.meger@X

Office: McConnell 112N

(do not come there physically, but remember the real world!)

Office Hours: by appointment (booking information on My Courses internal site)

X = mcgill.ca

Xiru Zhu

xiru.zhu@X

Office Hours: by appointment (booking information on My Courses internal site)

X = mcgill.ca

News

Jan 7, 2021 The first lecture will happen today on Zoom at [this link]. We will only give general background for the course and topic area and nothing examinable will be covered. This is to allow folks time to find the course without missing out (or extend their much-needed holiday!)

Overview

COMP 765 is a research seminar designed to teach the background necessary for conducting research in learning, adaptation, and deliberation for robot-like systems. Being located in Computer Science, our focus is on the design of algorithms and their properties, but this is an area of CS that must touch the real world - so a healthy dose of background on mechanisms, sensors and their physics will be mixed in. In 2021, McGill's Reinforcement Learning course will not be offered, so COMP 765 is re-titled to RL for Robotics, giving students the chance to replace some of the RL exposure they are looking for. We will not cover RL theory as deeply (or broadly), but 765 will give sufficient background, especially in Deep RL for continuous control, that a student can easily begin research after this course.

We will broadly cover the following areas:

Background on Optimal Decision Making for Robotics: types of robots, their sensors and actuators, planning and control problems and applications, operations research, human development.
Finite MDPs: Bellman equations, policy evaluation, on and off-policy methods, value and policy iteration, exploration, temporal difference methods, Q-learning.
RL with function approximation: policy gradients, actor critic, deep Q-learning, deterministic policy gradients, Batch RL.
Decision Making Under Uncertainty: learning and control with probabilistic models, time-series probabilistic inference, maximum entropy methods, distributional RL.
Transfer and Meta-learning: generalization of learned behaviors, policy adaptation, sim2real methods.
Additional topics to be finalized based on class interest: imitation learning, inverse RL and inverse optimal control, multi-agent team coordination, human-robot interaction, model-based planning, latent-space methods, hierarchical decision making.

Schedule

(As we go, this will be populated with the slides, required reading links, and information)

Week	Topics	Slides	References
Jan 7	Introduction Overview of ideas in the course.	1 - Introduction
Jan 12	MDP Formulations and Solutions to Linear Systems Defining MDP terminology and related problems. Linear quadratic regulators	2 - MDPs, Value functions and Roadmap 3 - LQR and Trajectory Optimization	RL book Chapters 1 and 3. Stephen Boyd's LQR notes and examples Peter Abeel's LQR lecture
Jan 19	Discrete Policy Evaluation and Improvement Discrete approximations of robots. Tabular solutions to prediction, value and policy iteration.	4 - Policy Eval 5 - Policy Improvement	RL Book Chapters 3 and 4
Jan 26	Reinforcement Learning from experience on discrete problems Monte-Carlo and Temporal Difference Methods. Sarsa. Q Learning.	6 - Online RL from data	RL Book Chapters 5, 6 and 13
Feb 2	Continuous state-action RL and Function Approximation Policy Gradient Theorem and Actor-Critic approaches. Scaling up policy improvment. Linear to deep FA. First research papers.	7 - Policy Gradients 8 - Function Approximation	RL Book Chapters 9, 10, and 11 Neural Fitted Q and Least-squares TD.
Feb 9	Model-based RL Dyna. Optimal control with learning. Probabilistic dynamics learning and belief-space policy updates.	9 - Model-based RL 10 - GPs	RL Book Chapter 8 Dyna by Sutton A GP Tutorial A GP textbook PILCO by Marc Deisenroth
Feb 16	Deep RL Student Presentations #1 Progress of methods from 2013-present. Connections of recent methods to theory.		Playing Atari with Deep RL Rainbow: Combining Improvements in Deep Reinforcement Learning Continuous control with deep reinforcement learning Proximal Policy Optimization Algorithms
Feb 23	Deep RL Student Presentations #2 Progress of methods from 2013-present. Connections of recent methods to theory.		Addressing Function Approximation Error in Actor-Critic Methods Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor A Distributional Perspective on Reinforcement Learning Maximum a Posteriori Policy Optimisation
March 2	Reading Week No lectures - enjoy the rest!
March 9	Status of RL for Robotics Latest accomplishments and challenges in research papers. Taxonomy of historical and current efforts. Examining demonstrations and benchmarks.	11 - Status of RL for Robotics	RSS 2020 sim2real workshop Lavalle text Ch1 Underactuated Robotics Section 1
March 16	Imitation Learning Paradigms for imitation including LfD, IRL and AfP. Passively learning from experts and active demonstration-seeking.	12 - Cloning, Apprenticeship, Batch RL	DAGGER Apprenticeship Learning Batch Constrained Q-Learning
March 23	Behavior generalization and transfer Brief peak at robust control. Inverse-dynamics adaptation. Transfer via imitation. Meta and Multi-task RL. Sim2Real transfer.	13 - Simple transfer	Policy Adjustment paper T-Reilience paper Cascaded GP Learning Mutual-Alignment Transfer Learning
March 30	Learning from Robotic Vision Scaling up our methods to visual inputs and real sensors. Visual foresight, multi-modal methods. Latent-space planning.	14 - Learning from Vision
April 6	Student presentations in state-of-the-art RL on robots Papers selected by students
April 13	Project Demonstrations and Wrap-Up

Assignments

Assignments will be due every 2 weeks and will appear here when distributed. They will mix pen-and-paper questions with short coding work.

Marking scheme

Assignments = 40%
1 in-class presentation on a technical topic = 10%
1 in-class research paper presentation = 10%
1 substantial final research project: 40% total
- 2-page written project proposal (mid-way through term): 5%
- Oral project presentation/demo (last week of classes): 10%
- Final project report (end of exam period): 25%

Recommended, but optional, textbooks

Related courses

Berkeley's Deep RL course
Pieter Abbeel's Advanced Robotics course
Related sections from Russ Tedrake's underactuated robotics course

Diversity and Inclusion

Robotics is one of the most imporant technologies in our world today and will be one of the most important skill-sets that people will use to influence the world in our lifetimes. This knowledge should be shared equally by all agents. Our goal is to make this content equally accessible to students of all backgrounds and we work to pro-actively acknowledge and address any bias that may occur during the term. Equal treatment of students from every gender, race and orientation is a top priority. We openly welcome suggestions on how to improve inclusion, by contacting the instructor either with your name or anonymously.

Disclaimers

McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offenses under the Code of Student Conduct and Disciplinary Procedures (see (this link) for more information). In accord with McGill University's Charter of Students' Rights, students in this course have the right to submit in English or in French any written work that is to be graded. In the event of extraordinary circumstances beyond the University's control, the content and/or evaluation scheme in this course is subject to change.

COMP 765: Reinforcement Learning for Robotics, Winter 2021