Whenever possible, I will post notes on some of the material covered in class, but that is not guaranteed. This is a graduate class and you are responsible for taking notes in class and reading the appropriate chapters of the textbooks.
The notes will be updated as we move along in the course. Please check the dates on the first page to keep track.
Introduction, course overview, examples of MDP.
Dynamical model of Markov decision processes (Notes on MDP examples) (Incomplete notes on MDP theory)
Optimal forest management problem
Statement and proof of dynamic programming decomposition
Explicit solution of optimal inventory management problem
Optimality of monotone strategies (Incomplete notes on MDP theory)
Monotone dynamic programming
Most of the material was adapted from Section 4.7 of Puterman.
Power-delay tradeoff in wireless communication (Notes on MDP examples)
Showing that the optimal strategy is monotone.
Most of the material is adapted from Randy Berry's thesis
No class
Optimal stopping problems (Notes on MDP examples) (Incomplete notes on MDP theory)
Call options and the secretary problem
For a detailed description of optimal stopping problems, see Ferguson, "Optimal Stopping and Applications".
For a popular account of optimal stopping problems, read the American Scientist article on Knowing when to stop
For a history of the secretary problem, see T.S. Ferguson, "Who solved the secretary problem?". Statistical science 4: 282–296, 1989.
The materal on optimality of threshold strategies is adapted from Sechan Oh's thesis
Interchange argument
Linear Quadratic Regulator
See Notes on LQG.
The idea of certainty equivalence was proposed by Simon, “Dynamic Programming Under Uncertainty with a Quadratic Criterion Function” and extended by Theil, “A Note on Certainty Equivalence in Dynamic Planning”.
For a generalization of these ideas, see Duchan, “A Clarification and a New Proof of the Certainty Equivalence Theorem”.
Partially observed Markov decision processes (POMDPs)
See Kumar and Varaiya, Sec 6.4–6.7 for main results on POMDP. Sec 6.5 proves the update equations for the belief state.
For a description of informations state, see Kumar and Varaiya Sec 6.4 and Mahajan and Mannan, “Decentralized Stochastic Control”
See Bertsekas Chapter 5 for general discussion on POMDPs.
For details on computational approaches to solve POMDPs, to lecture notes of Geoff Gordon.
Sequential hypothesis testing (Notes on POMDP examples)
The model presented in class is a variation of the model considered in Arrow, Blackwell. and Girshick, “Bayes and Minimax Solutions of Sequential Decision Problems”, Econometrica, pp. 213-244, Jul.-Oct., 1949.
See Bertsekas, Sec 5.5 and Whittle Vol II Chapter 40.
Measurement scheduling in sensor networks
Stochastic orders and monotone value functions in POMDPs
The material of this lecture is adapted from Lovejoy, "Some Monotonicity Results for Partially Observed Markov Decision Processes".
Also see Vikram Krishamurthy, "Structural results for partially observable Markov decision processes".
See lecture notes on MDP theory.
Linear Quadratic Gaussian systems
Static teams
The proof presented in class is adapted from Mahajan, Martins, Yuksel, "Static LQG teams with countably infinite players"
The original result was proved in Radner, "Team Decision Problems". Also see Marschak and Radner, "Economic Theory of Teams"
Witsenhausen's Counterexample
Partially nested information structure
Common information approach for decentralized control
Most of the material presented in class in taken from this overview paper
For the delayed sharing model, see Witsenhausen's paper, Varaiya and Walrand and Nayyar, Mahajan, and Teneketzis.
The idea of common information also works for general systems: for Markovian models see this paper and for LQG models see this paper