Note that these are just suggested topics and papers. You are free to pick a paper outside the list, but please check with me to confirm that the paper is appropriate for a term project.
Application of dynamic programming/Markov decision theory to an engineering problems. Examples include queueing theory, communication theory, sensor networks, smart grids, robotics, path planning, etc.
Multi-armed bandits are the most general form of problems for which the interchange argument works.
J.C. Gittins, “Bandit Processes and Dynamic Allocation Indices”, Journal of Royal Statistical Society, Series B, vol 14, 148–167, 1979
P. Whittle, “Multi-armed bandits and the Gittins index”, Journal of Royal Statistical Society, Series B vol 42, 143–149, 1980.
A. T. Ishikada and P. Varaiya, “Multi -Armed Bandit Problem Revisited”, Journal of optimization theory and applications, vol 83, 113-154, 1994.
E. Frostig and G. Weiss, “Four proofs of Gittins multi-armed bandit theorem”, Unpublished note.
J. Gittins, K. Glazebrook, R. Weber, "Multi-armed Bandit Allocation Indices", Wiley 2011.
A. Mahajan and D. Teneketzis, “Multi-armed bandit problems”, in Foundations and Applications of Sensor Management, Springer-Verlag.
Constrained stochastic control refers to the setup when the decision over time are coupled through a constraint.
F.J. Beutler and K.W. Ross, “Optimal Policies for Controlled Markov Chains with a Constraint”, Journal of Mathematical Analysis and Applications, vol 112, 236–252, 1985.
F.J. Beutler and K.W. Ross, “Time-average optimal constrained semi-Markov decision processes”, Advances in Applied Probability Vol. 18, issue 2, pp. 341-359, 1986.
K.W. Ross, “Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints”, Operations Research, vol 37, issue 3, 474-477, 1989.
Numerical algorithms for MDPs. In class, we focused on MDPs with finite state and action states. The following papers discuss numerical techniques for countable or uncountable state spaces.
D.P. Bertsekas, “Convergence of discretization procedure in dynamic programming”, IEEE Trans- actions on Automatic Control, 415–419, 1975.
M.L. Puterman, Markov decision processes: Discrete Stochastic Dynamic Programming, John Wiley, 1994. (Chapters on Value Iteration and Policy Iteration).
L.I. Sennott, “The computation of average optimal policies in denumerable state Markov decision chains”, Advances in Applied Probability, vol 29, 114–137, 1997.
Bertsekas, "Approximate Dynamic Programming", Lecture notes from stochastic control course at MIT.
Numerical solution of POMDPs.
R.D. Smallwood and E.J. Sondik, “The Optimal Control of Partially Observable Markov Processes over a Finite Horizon”, Operations Research, vol 11, 1071-1080, 1973.
J. Rust, “Using randomization to break the curse of dimensionality”, Econometrica, vol 65, 487–516, 1997.
Q-learning and stochastic adaptive control. In class, we focused on systems where the system dynamics and the cost function are known. Q-learning and stochastic adaptive control addresses the case when the system parameters are unknown.
Team theory.
Marschak and Radnar, "Economic Theory of Teams", Yale University Press, 1972. (Chapters 4 and 5; available electronically)
Y.C. Ho and K.C. Chu, “Team decision theory and information structures in optimal control prob- lems–Part I”, IEEE Transactions on Automatic Control, vol 17, 15-22, 1972.
A. Nayyar, A. Mahajan, and D. Teneketzis, “Optimal Control Strategies in Delayed Sharing Information Structures”, IEEE Transactions on Automatic Control, vol 56, 1606-1620, 2011.