Whenever possible, I will post notes on some of the material covered in class, but that is not guaranteed. This is a graduate class and you are responsible for taking notes in class and reading the appropriate chapters of the textbooks.
The notes will be updated as we move along in the course. Please check the dates on the first page to keep track.
Introduction, course overview, examples of MDP.
Dynamical model of Markov decision processes (Notes on MDP examples) (Incomplete notes on MDP theory)
Inventory management example
Probabilistic model of MDP and example on call options
Optimal choice problems (Notes on MDP examples)
For a history of the secretary problem, see T.S. Ferguson, "Who solved the secretary problem?". Statistical science 4: 282–296, 1989.
For a detailed description of optimal choice problem and its variations, see Chapter 2 of Ferguson, "Optimal Stopping and Applications".
For a popular account of optimal stopping problems, read the American Scientist article on Knowing when to stop
Interchange argument and notion of state
Partially observed Markov decision processes (POMDPs)
See Kumar and Varaiya, Sec 6.4--6.7 for main results on POMDP. Sec 6.5 proves the update equations for the belief state. Also see the notes scribed Pierre-Luc for the proof given in class.
See Bertsekas Chapter 5 for general discussion on POMDPs.
For details on computational approaches to solve POMDPs, to lecture notes of Geoff Gordon.
Sequential hypothesis testing (Notes on POMDP examples)
The model presented in class is a variation of the model considered in Arrow, Blackwell. and Girshick, “Bayes and Minimax Solutions of Sequential Decision Problems”, Econometrica, pp. 213-244, Jul.-Oct., 1949.
See Bertsekas, Sec 5.5 and Whittle Vol II Chapter 40.
Measurement scheduling in sensor networks
Stochastic orders and monotone value functions in POMDPs
The material of this lecture is adapted from Lovejoy, "Some Monotonicity Results for Partially Observed Markov Decision Processes".
See lecture notes on MDP theory.
Optimal replacement in machine repair
The material of this lecture is adapted from Lu Jin, Kazuhiro Kumagai and Kazuyuki Suzuki, "Control Limit Policy for Partially Observable Markov Decision Process Based on Stochastic Increasing Ordering". and S.C. Albright, "Structural results for partially observable Markov decision processes"
Also see Vikram Krishamurthy, "Myopic Bounds for Optimal Policy of POMDPs: A minor extension of Lovejoy's structural results"
Linear Quadratic Gaussian systems