ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan
Winter 2014

About | Lectures | Coursework

Whenever possible, I will post notes on some of the material covered in class, but that is not guaranteed. This is a graduate class and you are responsible for taking notes in class and reading the appropriate chapters of the textbooks.

The notes will be updated as we move along in the course. Please check the dates on the first page to keep track.

Mon, Jan 6

Introduction, course overview, examples of MDP.

If you have a clash with the mid-terms dates, you must let me know by Jan 10.
Reading: Kumar and Varaiya (Ch 1, 2); Bertsekas (Ch 1).

Wed, Jan 8 Mon, Jan 13 Wed, Jan 15

Dynamical model of Markov decision processes (Notes on MDP examples) (Incomplete notes on MDP theory)

Structure of optimal control strategies and dynamic programming decomposition.
Explicit solution of optimal gambling example.
Reading: Kumar and Varaiya (Ch 6.1 -- 6.3); Bertsekas (Ch 4).
Blackwell's paper on the principle of irrelevant information.
Maxim Raginsky's blog post explaining Blackwell's proof.
Statement and proof of dynamic programming decomposition

Mon, Jan 20

Inventory management example

Show that a base-stock strategy is optimal.
The example presented in class is a variation of the model presented in R. Bellman, I. Glicksberg, and O. Gross, "On the optimal inventory equation", Management Science, pp 83--104, 1955.
The example presented in class is also covered in Sec 4.2 of Bertsekas.
For a generalization of the inventory model considered in class, read the rest of Sec 4.2 of Bertsekas or Chapter 5 of Bellman, Dynamic Programming.

Wed, Jan 22

Probabilistic model of MDP and example on call options

Probabilistic model for MDPs and comments on implementation of dynamic programming for finite state and action spaces. See Puterman Sec 4.5 and notes on vectorized implementation
Example on call options. See notes on MDP examples and Ross, Chapter 1.3.

Mon, Jan 27

Optimal choice problems (Notes on MDP examples)

For a history of the secretary problem, see T.S. Ferguson, "Who solved the secretary problem?". Statistical science 4: 282–296, 1989.
For a detailed description of optimal choice problem and its variations, see Chapter 2 of Ferguson, "Optimal Stopping and Applications".
For a popular account of optimal stopping problems, read the American Scientist article on Knowing when to stop

Wed, Jan 29

Interchange argument and notion of state

See Bertsekas Sec 4.5 for details about the interchange argument.
See Bertsekas Sec 1.4 for some examples of state augmentation.
See Kumar and Varaiya Sec 6.4 for a description of information state.
See Def 1 in this paper for an more formal description on information state.

Mon, Feb 3 Wed, Feb 5

Partially observed Markov decision processes (POMDPs)

See Kumar and Varaiya, Sec 6.4--6.7 for main results on POMDP. Sec 6.5 proves the update equations for the belief state. Also see the notes scribed Pierre-Luc for the proof given in class.
See Bertsekas Chapter 5 for general discussion on POMDPs.
For details on computational approaches to solve POMDPs, to lecture notes of Geoff Gordon.

Mon, Feb 10

Sequential hypothesis testing (Notes on POMDP examples)

The model presented in class is a variation of the model considered in Arrow, Blackwell. and Girshick, “Bayes and Minimax Solutions of Sequential Decision Problems”, Econometrica, pp. 213-244, Jul.-Oct., 1949.
See Bertsekas, Sec 5.5 and Whittle Vol II Chapter 40.

Wed, Feb 12 Mon, Feb 17

Measurement scheduling in sensor networks

The model presented in class is a variation of the model considered in Shuman et al, “Measurement scheduling for soil moisture sensing: From physical models to optimal control”, Proceedings of the IEEE, vol. 98, no. 11, pp. 1918-1934, Nov, 2010.

Wed, Feb 19

Stochastic orders and monotone value functions in POMDPs

The material of this lecture is adapted from Lovejoy, "Some Monotonicity Results for Partially Observed Markov Decision Processes".
See lecture notes on MDP theory.

Mon, Feb 24

Optimal replacement in machine repair

The material of this lecture is adapted from Lu Jin, Kazuhiro Kumagai and Kazuyuki Suzuki, "Control Limit Policy for Partially Observable Markov Decision Process Based on Stochastic Increasing Ordering". and S.C. Albright, "Structural results for partially observable Markov decision processes"
Also see Vikram Krishamurthy, "Myopic Bounds for Optimal Policy of POMDPs: A minor extension of Lovejoy's structural results"

Wed, Feb 26 Mon, Mar 10

Linear Quadratic Gaussian systems

See Notes on LQG.
See Chapter 7 of Kumar and Varaiya.

ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan Winter 2014

About | Lectures | Coursework

Aditya Mahajan
Winter 2014