ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan
Winter 2014

About  |  Lectures  |  Coursework


\(\def\PR{\mathbb P} \def\EXP{\mathbb E} \def\reals{\mathbb R}\)

Assignment 5 (due Mar 17, 2014)

  1. A person is offered \(N\) free plays to be distributed as he pleases between two slot machines \(A\) and \(B\). Machine \(A\) pays \(\alpha\) dollars with known probability \(s\) and nothing with probability \((1-s)\). Machine \(B\) pays \(\beta\) dollars with probability \(p\) and nothing with probability \((1-p)\). The person does not know \(p\) but has an a priori probability distribution function \(f(p)\) for \(p\). The problem is to find a playing policy that maximizes expected profit.

    Suppose at the \(t\)-th time step, the person has played machine \(B\), \(m + n\) times, where \(m\) denotes the number of successes and \(n\) the number of failures. Let \(\xi(m,n)\) denote the probability that the next play of machine \(B\) will yield a success, that is, \[\xi(m,n) = \frac {\int_{0}^1 p^{m+1} (1 - p)^n f(p) dp } {\int_{0}^1 p^m (1 - p)^n f(p) dp }\] Note that \(\xi(m,n)\) can be precomputed for all values of \(m\) and \(n\).

    1. Show that \((m,n)\) is a sufficient statistic for the above problem.
    2. Write down a dynamic program in terms of \((m,n)\).
    3. Find an optimal strategy for \(T = 6\), \(\alpha = \beta = 1\), \(s = 0.6\), \(f(p) = 1\) for \(0 \le p \le 1\). (For each time, you need to identify the pairs \((m,n)\) where machine \(B\) should be playerd.)
  2. Consider the following modification of the sequential hypothesis testing problem considered in class. As in the model discussed in class, there are two hypothesis \(h_0\) and \(h_1\). The a priori probability that the hypothesis is \(h_0\) is \(p\).

    In contrast to the model discussed in class, there are \(N\) sensors. If sensor \(m\) is used at time \(t\) and the underlying hypothesis is \(h_i\), then the observation \(Y_t\) is distrubted according to pdf (or pmf) \(f^m_j(y)\). The cost of using sensor \(m\) is \(c_m\).

    Whenever the decision maker takes a measurement, he picks a sensor \(m\) uniformly at random from \(\{1, \dots, N\}\) and observes \(Y_t\) according to the distribution \(f^m_j(\cdot)\) and incurs a cost \(c_m\).

    The system continues for a finite time \(T\). At each time \(t < T\), the decision maker has three options: stop and declare \(h_0\), stop and declare \(h_1\), or continue to take another measurement. At time \(T\), the continue alternative is unavailable.

    1. Formulate the above problem as a POMDP. Identify an information state and write the dynamic programming decomposition for the problem.

    2. Show that the optimal control law has a threshold property, similar to the threshold propertly for the model considered in class.

  3. Let \(X\), \(Y\), and \(Z\) be jointly Gaussian with \(Y\) and \(Z\) independent. Let \[X \sim {\cal N}(\mu_X, \Sigma_X), \quad Y \sim {\cal N}(\mu_Y, \Sigma_Y), \quad Z \sim {\cal N}(\mu_Z, \Sigma_Z).\] and \[ \text{cov}(X,Y) = \Sigma _ {XY}, \quad \text{cov}(X,Z) = \Sigma _ {XZ}, \quad \text{cov}(X,Z) = \Sigma _ {XZ}.\] Define \[\hat X = \EXP[ X | Y, Z], \quad \tilde X = X - \hat X.\] Then, show that \[\begin{align} \hat X_t &= \mu_X + \Sigma_{XY} \Sigma_Y^{-1} (Y - \mu_Y) + \Sigma_{XZ} \Sigma_Z^{-1} (Z - \mu_Z) \\ \Sigma_{\tilde X} &= \Sigma_X - \Sigma_{XY} \Sigma_Y^{-1} \Sigma_{XY}^{\intercal} - \Sigma_{XZ} \Sigma_Z^{-1} \Sigma_{XZ}^{\intercal}. \end{align}\]

  4. Consider a linear system \[\begin{align} X _ {t+1} &= A _ t X _ t + B _ t U _ t + W _ t \\ Y _ t &= C _ t X _ t + N _ t \end{align}\]

    The primitive random variables \((X_1, W _ {1:T}, N _ {1:T})\) are jointly Gaussian. \(X_1\), \((W_1, N_1)\), \((W_2, N_2)\), ..., \((W_T, N_T)\) are independent. However, in contrast to the model considered in the class, \((W_t, N_t)\) are not independent. In particular, \[X_1 \sim {\cal N}(0, \Sigma_X), \quad W_t \sim {\cal N}(0, \Sigma_W), \quad N_t \sim {\cal N}(0, \Sigma_N), \quad \text{cov}(W_t, N_t) = \Gamma\].

    Show that the system can be written as \[ X_{t+1} = \overline A_t X_t + \overline U_t + \overline W_t\] with an appropriate choice of \(\overline A\), \(\overline U\), and \(\overline W\) such that \((X_1, \overline W_{1:T}, \overline N_{1:T})\) are independent and Gaussian.