ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan
Winter 2016

About  |  Lectures  |  Coursework


Assignment 3 (due March 28, 2016)

\(\def\PR{\mathbb P} \def\EXP{\mathbb E} \def\TRANS{\intercal} \def\reals{\mathbb R} \def\integers{\mathbb Z}\)

  1. A person is offered \(N\) free plays to be distributed as he pleases between two slot machines \(A\) and \(B\). Machine \(A\) pays \(\alpha\) dollars with known probability \(s\) and nothing with probability \((1-s)\). Machine \(B\) pays \(\beta\) dollars with probability \(p\) and nothing with probability \((1-p)\). The person does not know \(p\) but has an a priori probability distribution function \(f(p)\) for \(p\). The problem is to find a playing policy that maximizes expected profit.

    Suppose at the \(t\)-th time step, the person has played machine \(B\), \(m + n\) times, where \(m\) denotes the number of successes and \(n\) the number of failures. Let \(\xi(m,n)\) denote the probability that the next play of machine \(B\) will yield a success, that is, \[\xi(m,n) = \frac {\int_{0}^1 p^{m+1} (1 - p)^n f(p) dp } {\int_{0}^1 p^m (1 - p)^n f(p) dp }\] Note that \(\xi(m,n)\) can be precomputed for all values of \(m\) and \(n\).

    1. Show that \((m,n)\) is a sufficient statistic for the above problem.
    2. Write down a dynamic program in terms of \((m,n)\).
    3. Find an optimal strategy for \(T = 6\), \(\alpha = \beta = 1\), \(s = 0.6\), \(f(p) = 1\) for \(0 \le p \le 1\). (For each time, you need to identify the pairs \((m,n)\) where machine \(B\) should be playerd.)
  2. Consider the following modification of the sequential hypothesis testing problem considered in class. As in the model discussed in class, there are two hypothesis \(h_0\) and \(h_1\). The a priori probability that the hypothesis is \(h_0\) is \(p\).

    In contrast to the model discussed in class, there are \(N\) sensors. If sensor \(m\) is used at time \(t\) and the underlying hypothesis is \(h_i\), then the observation \(Y_t\) is distrubted according to pdf (or pmf) \(f^m_j(y)\). The cost of using sensor \(m\) is \(c_m\).

    Whenever the decision maker takes a measurement, he picks a sensor \(m\) uniformly at random from \(\{1, \dots, N\}\) and observes \(Y_t\) according to the distribution \(f^m_j(\cdot)\) and incurs a cost \(c_m\).

    The system continues for a finite time \(T\). At each time \(t < T\), the decision maker has three options: stop and declare \(h_0\), stop and declare \(h_1\), or continue to take another measurement. At time \(T\), the continue alternative is unavailable.

    1. Formulate the above problem as a POMDP. Identify an information state and write the dynamic programming decomposition for the problem.

    2. Show that the optimal control law has a threshold property, similar to the threshold propertly for the model considered in class.

  3. Let \(P\) and \(Q\) by \(n \times m\) row-stochastic matrices such that \(P \ge _ {tp} Q\) and let \(\pi\) and \(\mu\) be \(n \times 1\) probability vectors such that \(\pi \ge _ {r} \mu\). Show that \[ P^\TRANS \pi \ge _ {r} Q^\TRANS \mu.\]

  4. Let \(X\), \(Y\), and \(Z\) be jointly Gaussian with \(Y\) and \(Z\) independent. Let \[X \sim {\cal N}(\mu_X, \Sigma_X), \quad Y \sim {\cal N}(\mu_Y, \Sigma_Y), \quad Z \sim {\cal N}(\mu_Z, \Sigma_Z).\] and \[ \text{cov}(X,Y) = \Sigma _ {XY}, \quad \text{cov}(X,Z) = \Sigma _ {XZ}, \quad \text{cov}(X,Z) = \Sigma _ {XZ}.\] Define \[\hat X = \EXP[ X | Y, Z], \quad \tilde X = X - \hat X.\] Then, show that \[\begin{align} \hat X_t &= \mu_X + \Sigma_{XY} \Sigma_Y^{-1} (Y - \mu_Y) + \Sigma_{XZ} \Sigma_Z^{-1} (Z - \mu_Z) \\ \Sigma_{\tilde X} &= \Sigma_X - \Sigma_{XY} \Sigma_Y^{-1} \Sigma_{XY}^{\intercal} - \Sigma_{XZ} \Sigma_Z^{-1} \Sigma_{XZ}^{\intercal}. \end{align}\]

  5. Consider a linear system \[\begin{align} X _ {t+1} &= A _ t X _ t + B _ t U _ t + W _ t \\ Y _ t &= C _ t X _ t + N _ t \end{align}\]

    The primitive random variables \((X_1, W _ {1:T}, N _ {1:T})\) are jointly Gaussian. \(X_1\), \((W_1, N_1)\), \((W_2, N_2)\), ..., \((W_T, N_T)\) are independent. However, in contrast to the model considered in the class, \((W_t, N_t)\) are not independent. In particular, \[X_1 \sim {\cal N}(0, \Sigma_X), \quad W_t \sim {\cal N}(0, \Sigma_W), \quad N_t \sim {\cal N}(0, \Sigma_N), \quad \text{cov}(W_t, N_t) = \Gamma\].

    Show that the system can be written as \[\begin{align} \overline X_{t+1} &= \overline A_t \overline X_t + \overline B \overline U_t + \overline W_t \\ Y_t &= \overline C_t \overline X_t + \overline N_t \end{align}\] with an appropriate choice of \(\overline A_t\), \(\overline B_t\), \(\overline C_t\), \(\overline W_t\), \(\overline N_t\) such that \((X_1, \overline W_{1:T}, \overline N_{1:T})\) are independent and Gaussian and the observation process \(Y_{1:T}\) is identical in both cases.