ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan
Winter 2014

About | Lectures | Coursework

Assignment 3 (due Feb 10, 2014)

Consider the inventory management problem considered in class. Suppose that the demand has a uniform distribution with support $\{0, 1, \dots, d\}$. Impose an additional assumption that there is an upper bound $\bar b$ and a lower bound $\underline b$ to the allowable values of the stock $X_t$. This imposes an additional constraint on $u_t$ that is \[ \underline b + d \le u_t + x_t \le \bar b \]
1. Show that the structure of the optimal strategy remains the same, i.e., there exists scalars $\{s_t\}_{t=1}^T$ such that the optimal strategy is of the form \[ g_t(x_t) = \begin{cases} s_t - x_t, & \text{if $x_t \le s_t$} \\ 0, & \text{otherwise} \end{cases}\].
2. Let $T = 100$, $\underline b = 5$, $\bar b = 30$, $d = 5$, $p=1$, $a=2$. Use dyanmic programming to compute the sequence $\{s_t\}_{t=1}^T$. Submit a plot of the sequence and your code to do the computations.
You are trying to find the best parking space to use that minimizes the time need to get to a restaurant. There are 50 parking spaces, and you see spaces 1, 2, ..., 50 in order. As you approach each parking space, you see whether it is full or empty. We assume, somewhat heroically, that the probability that each space is occupied follows an independent Bernoulli process, which is to say that each space will be occupied with probability $p$ and will be free with probability $1-p$, and each outcome is independent of the other.

It takes 2 seconds to drive past each parking space and it takes 8 seconds to walk past. That is, if you part in space $n$, it will require $8(50-n)$ seconds to walk to the restaurant. Furthermore, it would have taken $2n$ seconds to get to this space. If you get to the last space without finding an opening, then you will have to drive into a special lot down the block, adding 30 seconds to your trip.
1. Represent the problem as a dynamical system. What is the state space? What is the action space? What is the time horizon? What is the cost function?
2. Write down the dynamic program for this problem.
3. Assume $p=0.6$. Write a program to solve the dynamic program. Describe the optimal strategy by either given a succinct representation of the strategy, or by tabulating the stategy for all values of times. (Include your code in the solution).
Imagine that you are taking three courses that have assigned a term project instead of a final exam. You quickly estimate how much time you would need to spend on each project to get 100 points. Assume that if you spend $h$ hours in a project, which you estimated would need $H$ hours to get 100 points, then for $h < H$ you score will be \[ R = 100 \sqrt{h/H}\] Thus, there are declining marginal returns to putting in more work into a project. The time you estimate to get the full score on each of the three projects is given by

Project Completion time $H$

1 20

2 15

3 10

You decide that you cannot spend more than a total of 30 hours on the projects, and you want to choose a value of $h$ for each project that is a multiple of 5 hours. YOu also feel that you need to spend at least 5 hours on each project (so that you do not completely ignore a project).

You want to use dynamic programming to find an optimal allocation to maximize the total reward.
1. Represent the problem as a dynamical system. What is the state space? What is the action space? What is the time horizon? What is the reward function?
2. Write down the dynamic program for this problem.
3. Numerically solve the dynamic program to find the optimal time allocation.

Project	Completion time \(H\)
1	20
2	15
3	10

ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan Winter 2014

About | Lectures | Coursework

Assignment 3 (due Feb 10, 2014)

Aditya Mahajan
Winter 2014