\(\def\PR{\mathbb P} \def\EXP{\mathbb E} \def\IND{\mathbb 1} \def\TRANS{\intercal} \def\reals{\mathbb R} \def\integers{\mathbb Z}\)
Consider the following decentralized variation of sequential hypothesis testing.
An unknown hypothesis \(H\) takes values \(h_0\) and \(h_1\) with \(\PR(H = h_0) = p\).
There are two sensors indexed by \(i\), \(i \in \{1, 2\}\). The observations \(\{Y^i_t\}_{t \ge 1}\) of sensor \(i\) are i.i.d. and distributed according the the PDF \(f_0\) if the hypothesis is \(h_0\) and according to PDF \(f_1\) if the hypothesis is \(h_1\). Conditioned on the hypothesis, observations \(\{Y^1_t\}_{t \ge 1}\) and \(\{Y^2_t\}_{t \ge 1}\) are independent.
The information structure is: \(I^i_t = \{Y^i_{1:t}\}\).
At each time \(t\), \(t < T\), sensor \(i\) has three options: continue, stop and declare \(h_0\), stop and declare \(h_1\). At the terminal time \(T\), the sensor has only two options: stop and declare \(h_0\), stop and declare \(h_1\).
Let \(\tau^i\) denote the stopping time when sensor \(i\) decides to stop.
The objective is to choose control stategies at both sensors to minimize the total cost: \[ \EXP[ c^1 \tau^1 + c^2 \tau^2 + \ell(H, U^1_{\tau^1}, U^2_{\tau^2}) ]\] where \(\ell(h,u^1, u^2)\) is a coupled loss function.
Suppose the strategy of sensor 2 is fixed. Show that the problem of finding the best response at sensor 1 is a POMDP. Identify the information state and the dynamic program.
Use the dynamic program of the previous part to show that (for any arbitrary strategy of sensor 2), the strategy of sensor 1 has a threshold property similar to the threshold property for centralized sequential hypothesis testing problem.
Consider the boardcast information structure with two users, indexed by \(i\), \(i \in \{0, 1\}\). The states and actions of user \(i\) are Eucledian vectors. The dynamics are given by: \[\begin{align} X^0_{t+1} &= A^0 X^0_t + B^0 U^0_t + W^0_t \\ X^1_{t+1} &= A^{10} X^0_t + A^{11} X^1_t + B^1 U^1_t + W^1_t \end{align}\] where \(A^0\), \(A^{10}\), \(A^{11}\), \(B^0\), and \(B^1\) are matrices of appropriate dimensions. The primitive random variables: \(\{X^0_1, X^1_1, W^0_{1:T}, W^1_{1:T}\}\) are independent.
The information structure is given by \[I^0_t = \{X^0_{1:t}\} \quad \text{and} \quad I^1_t = \{X^0_{1:t}, X^1_{1:t}\}\]
The per-step cost is given by \((X^0_t)^\TRANS Q^0 (X^0_t) + (X^1_t)^\TRANS Q^1 (X^1_t) + (U^0_t)^T R^0 (U^0_t) + (U^1_t)^\TRANS R^1 (U^1_t)\), where \(Q^0\), \(Q^1\), \(R^0\), and \(R^1\) are symmetric positive definite matrices.
Simify the dynamic program derived in class for the above LQG case. (Hint: This simiplification is similiar to that for the one-step delayed sharing information structure.)