1 a) Either of the following: 1100101111100110 0011010000011001 1,2,5,7,8,9,10,11,14,15 3,4,6,12,13,16 b) Gap lengths: 1,1,3,2,1,1,1,1,3,1 3,1,2,6,1,3 MARKING SCHEME: one point for each. 2.) Take directly from lecture notes. 3.) a) a,b,a,aa,ba,bab,b length offset/new 0 a 0 b 1 2 2 1 2 4 3 2 1 1 b) totbits depends on the number of bits used for each phrase but also on the number of phrases phi(n). As n increases, the number of symbols in each phrase increases as well, since more phrases have been seen in the past and hence the chance of matching to a longer phrase is greater. The key observation is that phi(n) increases very slowly with n, and counteracts the growth of log i with n c) sum_i (log i + log N), rather than sum_i (log i + 1) To improve further, we could use a variable length code to say what the new symbol is, instead of a fixed length code with log N bits. MARKING SCHEME: two points for (a), one for (b), two for (c). 4.) a) p(X_2 | X_j = 1) = [ 1/8 5/8 2/8] b) p(X_2 = 1, X_3 = 1 | X_1 = 3) = p(X_3 = 1 | X_2 = 1) * p(x_2 = 1 | X_1 = 3) = (6/8) * (1/8) MARKING SCHEME: one point for each. 5.) a) k = 112 b) p(X_k+1 | X_k = a) = [4/19 15/19] Some students gave answer [4/20 15/20]. I took off half a point unless student clarified why the probabilities add up to 19/20 instead of 20/20. Note I did not ask for probabilities of the next thing that is encoded (including ESCAPE or EOF) but rather probabilities of next element in the sequence. My apologies if you find this annoying.. I am not entirely comfortable taking off the half point. c) [3/13 10/13] d) The current context is ...ba_____ Using ppm, the encoder sends "a ESC a" to encode the next two symbols (aa). MARKING SCHEME: one point for each.