ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan
Winter 2016

About | Lectures | Coursework

Assignment 5 (practice problems)

\(\def\PR{\mathbb P} \def\EXP{\mathbb E} \def\IND{\mathbb 1} \def\TRANS{\intercal} \def\reals{\mathbb R} \def\integers{\mathbb Z}\)

Consider the following decentralized variation of sequential hypothesis testing.
- An unknown hypothesis \(H\) takes values \(h_0\) and \(h_1\) with \(\PR(H = h_0) = p\).
- There are two sensors indexed by \(i\), \(i \in \{1, 2\}\). The observations \(\{Y^i_t\}_{t \ge 1}\) of sensor \(i\) are i.i.d. and distributed according the the PDF \(f_0\) if the hypothesis is \(h_0\) and according to PDF \(f_1\) if the hypothesis is \(h_1\). Conditioned on the hypothesis, observations \(\{Y^1_t\}_{t \ge 1}\) and \(\{Y^2_t\}_{t \ge 1}\) are independent.
- The information structure is: \(I^i_t = \{Y^i_{1:t}\}\).
- At each time \(t\), \(t < T\), sensor \(i\) has three options: continue, stop and declare \(h_0\), stop and declare \(h_1\). At the terminal time \(T\), the sensor has only two options: stop and declare \(h_0\), stop and declare \(h_1\).
- Let \(\tau^i\) denote the stopping time when sensor \(i\) decides to stop.
- The objective is to choose control stategies at both sensors to minimize the total cost: \[ \EXP[ c^1 \tau^1 + c^2 \tau^2 + \ell(H, U^1_{\tau^1}, U^2_{\tau^2}) ]\] where \(\ell(h,u^1, u^2)\) is a coupled loss function.
1. Suppose the strategy of sensor 2 is fixed. Show that the problem of finding the best response at sensor 1 is a POMDP. Identify the information state and the dynamic program.
2. Use the dynamic program of the previous part to show that (for any arbitrary strategy of sensor 2), the strategy of sensor 1 has a threshold property similar to the threshold property for centralized sequential hypothesis testing problem.
Consider the boardcast information structure with two users, indexed by \(i\), \(i \in \{0, 1\}\). The states and actions of user \(i\) are Eucledian vectors. The dynamics are given by: \[\begin{align} X^0_{t+1} &= A^0 X^0_t + B^0 U^0_t + W^0_t \\ X^1_{t+1} &= A^{10} X^0_t + A^{11} X^1_t + B^1 U^1_t + W^1_t \end{align}\] where \(A^0\), \(A^{10}\), \(A^{11}\), \(B^0\), and \(B^1\) are matrices of appropriate dimensions. The primitive random variables: \(\{X^0_1, X^1_1, W^0_{1:T}, W^1_{1:T}\}\) are independent.
- The information structure is given by \[I^0_t = \{X^0_{1:t}\} \quad \text{and} \quad I^1_t = \{X^0_{1:t}, X^1_{1:t}\}\]
- The per-step cost is given by \((X^0_t)^\TRANS Q^0 (X^0_t) + (X^1_t)^\TRANS Q^1 (X^1_t) + (U^0_t)^T R^0 (U^0_t) + (U^1_t)^\TRANS R^1 (U^1_t)\), where \(Q^0\), \(Q^1\), \(R^0\), and \(R^1\) are symmetric positive definite matrices.
Simify the dynamic program derived in class for the above LQG case. (Hint: This simiplification is similiar to that for the one-step delayed sharing information structure.)

ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan Winter 2016

About | Lectures | Coursework

Assignment 5 (practice problems)

Aditya Mahajan
Winter 2016