Final Exam Solutions

---------------- Question 1 ---------------------

a)   The most probable sequence is  000.   The tag is 125/1024.

b)   The least probable sequence is  111.   The tag is 997/1024.

Note:  you didn't need to go through the l_k,u_k computation
for (a) and (b) since the sequences were the first and last
of the eight possible.  If you understand how the cumulative
distribution works, then you only needed to compute the probability
of the two sequences and work from there.   i.e. for (a),
p(000) = F(000) = 125/512,  so T(000) = (0 + 125/512) / 2.
Similarly for (b),  T(111) =  (1 + 1 - (3/8)^3) / 2.

c)  Here you need to compute l_k and u_k.   You get
    (l_3, u_3)  = (200/512,   275/512)  so T(x) = 475/1024
    which in binary is .0111011011

    Also, since p(x) = 75/512,  we conclude there are 4 bits
    so C(x) = 0111

MARKING SCHEME.   

1 points for each of (a),(b) and 5 points for (c).
The 5 points for (c) are:  2 points for l_3,u_3,  1 point for p(x), 
1 point for "4 bits", and 1 point for the tag T(x) in binary.
   
---------------- Question 2 ---------------------

a)    ..., -12, -8, -4, 0, 4, 8, 12, ....

b)    Rather than having infinitely many possible values, we would
    only have a finite number.  For example, if N = 5, we would 
    have  -8, -4, 0, 4, 8.

c)   value to be coded       level       decoder's estimate
     ----------------        -----       ------------------
         25		      6 	   24
        -5.9  (24-18.1)       -1           20
        -8.7  (20-11.3)       -2           12

d)   If we assume 25 is quantized then ...

     value to be coded       level       decoder's estimate
     ----------------        -----       ------------------
         25		      		   24
         6.1  (18.1-12)       2            20  (12 + 8)
        -5.7  (11.3-10)       0            10  (10 + 0) 

    ... but if we assume 25 is sent losslessly then (which is not
        what I wanted, but which I suppose you could do !!!) ...

e)   The answer I was looking for was:

       sum_j  ( x_{j+1} - x_j / 2 )^2  <  sum_j  ( x_{j+1} - x_j)^2  

MARKING SCHEME:

   1 point for each of (a), (b).  3 points for (c,d).  1 point for e.
   Some details:

     c) 
     If you assumed that the value 25 was encoded losslessly, then
     then answer would be slightly different:

      value to be coded       level       decoder's estimate
     ----------------        -----       ------------------
         25		      		   25
        -6.9  (25-18.1)       -2           17  (25 - 8)
        -5.7  (17-11.3)       -1           13  (17 - 4) 

     d)
     value to be coded       level       decoder's estimate
     ----------------        -----       ------------------
         25		       		   25
         5.6  (18.1- 25/2)     1           16.5  (25/2 + 4 = 16.5)
        3.05 (11.3- 16.5/2)    1           12.05  (10 + 0) 


   e)  Some students noted that     sum_j  ( x_{j+1} - x_j)^2  would
     be zero if each element of the sequence was half as great as
     the previous element.  That's true, but this is far too specific.
     I asked you to *characterize*, which means say something general.
     I have 1/2 point for this.

--------------Question 3------------------
 
(a)  (2 marks)  
     This was straightforward from the notes.  I expected everyone to get it.
     I gave 2 marks for the 8 sketches, as long as the functions for k > 0
     had positive and negative values.  (Some students drew them with positive
     values only.)  

(b)   (3 marks)

This question was not easy.  It was meant to test who had understand what 
the 2D DCT does.  

If you compute the 2D DCT and round the numbers, you get:

   570  -258   -13   107   -10   -56    -5    57
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0  

Of course, I was not expecting exact numbers.  As I emphasized 
in the question, I wanted a rough estimate only.

MARKING SCHEME:  I gave 1 point if you wrote that the values
are highest at the top left and lowest at the bottom right.  This
is typically the case for general images, but it is not the case 
for this particular image block, which has roughly the same values in
each row.  The difference here is that  the rows have a different
structure than the columns.  I wanted to see who recognized this
and who could interpret it in terms of the DCT.

I gave two points if you noted that the rows should be treated
differently from columns and if you wrote that the values should fall 
from left to right (rather than diagonally).

I gave the full three points if you noticed that the values are
approximately 0 in all rows except for the first. 

(c)  JPEG uses differential coding for DC values because the DC
values are approximately the same from one block to the next in
typical images.  This was discussed in class.   JPEG doesn't
use differential coding for AC values because AC values don't
tend to be the same from one block to the next.   The implicit
assumption that is being made about AC values is that they are
uncorrelated - the same assumption is made both within and between
blocks.  (This is the basic assumption made in all transform
coding!)

MARKING SCHEME:  1 point for DC.  1 point for AC.  I gave only .5
point for AC if you said that AC values tend to be small.  

-------------   QUESTION 4 ----------------------------------

(a)  (4 points)

- The audio signal is broken into small blocks, 32 samples each.

- These 32 samples for each block j are transform coded by DCT
to give 32 "Y_j(k)" coefficients.

- For each k, the Y_j(k) are quantized using a Delta value.  The
Delta value, Delta(k), is constant for 16 blocks (representing 512
samples).    These Delta(k) values must be transmitted, along
with the  levels, l_j(k) round( Y_j(k) / Delta(k) ),  to the decoder.

- The decoder  takes the levels l_j(k),  multiplies by Delta(k), then
applies the inverse DCT to get back the original signal.

(b)  (3 points)

Masking is a phenemonon in human hearing, whereby the presence
of one sound component can prevent another from being heard.  

Masking is frequency dependent.   Two sound components
that have similar frequencies tend to mask each other more than 
sound components that have different frequencies.   

How is masking used in MP3?   Large blocks (512 samples) are analyzed for their frequency
components and Delta values (for quantizing the DCT transformed small
blocks) are chosen based on these components and an estimate of how 
much they mask each other.

MARKING SCHEME:  for (a) and (b), several students did not mention
the use of blocks (small vs. large).   I gave only part marks
for no mention of blocks. 

(c)  (3 points)

This question was more challenging.   
I wanted to see if you understood the basics of MPEG, and if you could
apply the concepts to a similar problem.    Details I was looking
for include:

-  MPEG breaks each image into blocks, and then looks for the best
match from each block from a previously encoded frame.   It does so
by sending an offset vector and a set of intensity differences.

- The offset gives the location of the best match for the block in
the previously encoded frame.  (Some students suggested the offset
should point to a *block*.  But this is inapprpriate since there
is no reason why the offset should align well with another entire
block.)

- To apply this concept to audio, suppose the left channel has been encoded and the right
channel is to be encoded, given the left channel.  The right channel could be 
partitioned into blocks, and each block could be encoded with an offset
vector (right, relative to left)  and a difference vector.