ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan
Winter 2014

About | Lectures | Coursework

Homeworks

Homework 1, Assigned Jan 13. Due Jan 20.
Homework 2, Assigned Jan 20. Due Jan 29. See notes for solution to Q3.
Homework 3, Assigned Jan 39. Due Feb 10.
Homework 4, Assignment Feb 16. Due Feb 26.
Homework 5, Assignment March 9. Due March 17.

Project

Project report due on April 9, 2014. To be done either individually, or in teams of two.
The potential topics for the project are listed below.
It was announced earlier that each team has to make a short presentation on the project. That is no longer the case, and the only deliverable for the project is the project report.
The purpose of the term project is to critique a published research paper or book chapter relevent to the material covered in class.
Typeset the report in a single-column format with 1 inch margins. Use 11pt font and single spacing. The expected length of the report is 8-12 pages.
Grading scheme
- Background, presentation of the problem setting, and literature overview: 20%
- Summary of the proof results, proof outlines, and examples to illustrate the main results: 50%
- Critical evaluation of the contributions: 20%
- Clarity of the presentation: 10%

Potential Project Topics

Note that these are just suggested topics and papers. You are free to pick a paper outside the list, but please check with me to confirm that the paper is appropriate for a term project.
Application of dynamic programming/Markov decision theory to an engineering problems. Examples include queueing theory, communication theory, sensor networks, smart grids, robotics, path planning, etc.
Multi-armed bandits are the most general form of problems for which the interchange argument works.
- J.C. Gittins, “Bandit Processes and Dynamic Allocation Indices”, Journal of Royal Statistical Society, Series B, vol 14, 148–167, 1979
- P. Whittle, “Multi-armed bandits and the Gittins index”, Journal of Royal Statistical Society, Series B vol 42, 143–149, 1980.
- A. T. Ishikada and P. Varaiya, “Multi -Armed Bandit Problem Revisited”, Journal of optimization theory and applications, vol 83, 113-154, 1994.
- E. Frostig and G. Weiss, “Four proofs of Gittins multi-armed bandit theorem”, Unpublished note.
- J. Gittins, K. Glazebrook, R. Weber, "Multi-armed Bandit Allocation Indices", Wiley 2011.
- A. Mahajan and D. Teneketzis, “Multi-armed bandit problems”, in Foundations and Applications of Sensor Management, Springer-Verlag.
Constrained stochastic control refers to the setup when the decision over time are coupled through a constraint.
- F.J. Beutler and K.W. Ross, “Optimal Policies for Controlled Markov Chains with a Constraint”, Journal of Mathematical Analysis and Applications, vol 112, 236–252, 1985.
- F.J. Beutler and K.W. Ross, “Time-average optimal constrained semi-Markov decision processes”, Advances in Applied Probability Vol. 18, issue 2, pp. 341-359, 1986.
- K.W. Ross, “Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints”, Operations Research, vol 37, issue 3, 474-477, 1989.
Numerical algorithms for MDPs. In class, we focused on MDPs with finite state and action states. The following papers discuss numerical techniques for countable or uncountable state spaces.
- D.P. Bertsekas, “Convergence of discretization procedure in dynamic programming”, IEEE Trans- actions on Automatic Control, 415–419, 1975.
- M.L. Puterman, Markov decision processes: Discrete Stochastic Dynamic Programming, John Wiley, 1994. (Chapters on Value Iteration and Policy Iteration).
- L.I. Sennott, “The computation of average optimal policies in denumerable state Markov decision chains”, Advances in Applied Probability, vol 29, 114–137, 1997.
- Bertsekas, "Approximate Dynamic Programming", Lecture notes from stochastic control course at MIT.
Numerical solution of POMDPs.
- R.D. Smallwood and E.J. Sondik, “The Optimal Control of Partially Observable Markov Processes over a Finite Horizon”, Operations Research, vol 11, 1071-1080, 1973.
- J. Rust, “Using randomization to break the curse of dimensionality”, Econometrica, vol 65, 487–516, 1997.
Q-learning and stochastic adaptive control. In class, we focused on systems where the system dynamics and the cost function are known. Q-learning and stochastic adaptive control addresses the case when the system parameters are unknown.
- Watkins, "Learning from Delayed Rewards", PhD thesis, University of Cambridge, UL, 1989.
- J.N. Tsitsiklis, “Asynchronous Stochastic Approximation and Q-learning,” Machine Learning, no. 16, pp. 185–202, 1994.
- G.C. Goodwin, P.J. Ramadge, and P.E. Caines, “Discrete Time Stochastic Adaptive Control” SIAM J. Control and Optimization, Vol 19, No. 6, Nov 1981.
- Kumar and Varaiya, "Stochastic Systems: Estimation, Identification, and Adaptive Control", 1984. Chapters 13 and 14. (Available electronically)
Team theory.
- Marschak and Radnar, "Economic Theory of Teams", Yale University Press, 1972. (Chapters 4 and 5; available electronically)
- Y.C. Ho and K.C. Chu, “Team decision theory and information structures in optimal control prob- lems–Part I”, IEEE Transactions on Automatic Control, vol 17, 15-22, 1972.
- A. Nayyar, A. Mahajan, and D. Teneketzis, “Optimal Control Strategies in Delayed Sharing Information Structures”, IEEE Transactions on Automatic Control, vol 56, 1606-1620, 2011.

ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan Winter 2014

About | Lectures | Coursework

Aditya Mahajan
Winter 2014