ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan
Winter 2016

About | Lectures | Coursework

Whenever possible, I will post notes on some of the material covered in class, but that is not guaranteed. This is a graduate class and you are responsible for taking notes in class and reading the appropriate chapters of the textbooks.

The notes will be updated as we move along in the course. Please check the dates on the first page to keep track.

Thu, Jan 7

Introduction, course overview, examples of MDP.

Reading: Kumar and Varaiya (Ch 1, 2); Bertsekas (Ch 1).

Mon, Jan 11 Wed, Jan 13 Mon, Jan 18

Dynamical model of Markov decision processes (Notes on MDP examples) (Incomplete notes on MDP theory)

Reading: Kumar and Varaiya (Ch 6.1 -- 6.3); Bertsekas (Ch 4).
Functional and probabilisitic model of dynamical systems
Structure of optimal control strategies and dynamic programming decomposition.
Explicit solution of optimal gambling example.
Optimal forest management problem
Blackwell's paper on the principle of irrelevant information.
Maxim Raginsky's blog post explaining Blackwell's proof.
Statement and proof of dynamic programming decomposition
Explicit solution of optimal inventory management problem

Wed, Jan 20 Mon, Jan 25

Optimality of monotone strategies (Incomplete notes on MDP theory)

Stochastic dominannce
Stochastic monotone Markov chains
Submodular and supermodular functions
Conditions under which value function is monotone
Conditions under which optimal strategy is monotone
Monotone dynamic programming
Most of the material was adapted from Section 4.7 of Puterman.

Wed, Jan 27

Power-delay tradeoff in wireless communication (Notes on MDP examples)

Convexity for functions with discrete domain
Showing that the optimal strategy is monotone.
Most of the material is adapted from Randy Berry's thesis

Mon, Feb 1 Wed, Feb 3

No class

Mon, Feb 8 Wed, Feb 10

Optimal stopping problems (Notes on MDP examples) (Incomplete notes on MDP theory)

Call options and the secretary problem
For a detailed description of optimal stopping problems, see Ferguson, "Optimal Stopping and Applications".
For a popular account of optimal stopping problems, read the American Scientist article on Knowing when to stop
For a history of the secretary problem, see T.S. Ferguson, "Who solved the secretary problem?". Statistical science 4: 282–296, 1989.
The materal on optimality of threshold strategies is adapted from Sechan Oh's thesis

Mon, Feb 15

Interchange argument

See Bertsekas Sec 4.5 and Ross 1.7 for details about the interchange argument.
For an interesting example, see Buyukkoc, Varaiya, Walrand, “The c-mu rule revisited”.
For more detailed examples, see Nain, Tsoucas, Walrand, “Interchange arguments in stochastic scheduling”

Mon, Feb 15 Wed, Feb 17

Linear Quadratic Regulator

See Notes on LQG.
The idea of certainty equivalence was proposed by Simon, “Dynamic Programming Under Uncertainty with a Quadratic Criterion Function” and extended by Theil, “A Note on Certainty Equivalence in Dynamic Planning”.
For a generalization of these ideas, see Duchan, “A Clarification and a New Proof of the Certainty Equivalence Theorem”.

Mon, Feb 22 Wed, Feb 24

Partially observed Markov decision processes (POMDPs)

See Kumar and Varaiya, Sec 6.4–6.7 for main results on POMDP. Sec 6.5 proves the update equations for the belief state.
For a description of informations state, see Kumar and Varaiya Sec 6.4 and Mahajan and Mannan, “Decentralized Stochastic Control”
See Bertsekas Chapter 5 for general discussion on POMDPs.
For details on computational approaches to solve POMDPs, to lecture notes of Geoff Gordon.

Sequential hypothesis testing (Notes on POMDP examples)

The model presented in class is a variation of the model considered in Arrow, Blackwell. and Girshick, “Bayes and Minimax Solutions of Sequential Decision Problems”, Econometrica, pp. 213-244, Jul.-Oct., 1949.
See Bertsekas, Sec 5.5 and Whittle Vol II Chapter 40.

Measurement scheduling in sensor networks

The model presented in class is a variation of the model considered in Shuman et al, “Measurement scheduling for soil moisture sensing: From physical models to optimal control”, Proceedings of the IEEE, vol. 98, no. 11, pp. 1918-1934, Nov, 2010.

Mon, Mar 7 Mon, Mar 14

Stochastic orders and monotone value functions in POMDPs

The material of this lecture is adapted from Lovejoy, "Some Monotonicity Results for Partially Observed Markov Decision Processes".
Also see Vikram Krishamurthy, "Structural results for partially observable Markov decision processes".
See lecture notes on MDP theory.

Wed, Mar 16

Linear Quadratic Gaussian systems

See Notes on LQG.
See Chapter 7 of Kumar and Varaiya.

Mon, Mar 21

Static teams

The proof presented in class is adapted from Mahajan, Martins, Yuksel, "Static LQG teams with countably infinite players"
The original result was proved in Radner, "Team Decision Problems". Also see Marschak and Radner, "Economic Theory of Teams"

Wed, Mar 23

Witsenhausen's Counterexample

See Witsenhausen, "A Counterexample in Stochastic Optimum Control"
Also see updated notes on LQG.

Partially nested information structure

See Ho and Chu, "Team decision theory and information structures in optimal control problems--Part I"

Wed, Mar 30

Common information approach for decentralized control

Most of the material presented in class in taken from this overview paper
For the delayed sharing model, see Witsenhausen's paper, Varaiya and Walrand and Nayyar, Mahajan, and Teneketzis.
The idea of common information also works for general systems: for Markovian models see this paper and for LQG models see this paper

ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan Winter 2016

About | Lectures | Coursework

Aditya Mahajan
Winter 2016