Sequential Files

  1. In order to perform master file update with a single pass over the master file and a single pass over the transaction file, reading every record of each file at most once, which of the following conditions must hold?

  2. If the file of mutual funds transactions from assignment #1 was sorted chronologically (sorted by time) rather than by investor account number, the general master file update algorithm as presented in class would no longer be sufficient to perform the update of the investor master file. Describe two possible solutions.

  3. a) A sorted master file consisting of 1000 records and an unsorted transactions file, containing only change transactions (i.e. no insertions or deletions), are stored on disk. You are asked to use the on-line update method to apply the transactions file to the master file, ensuring that if a disk crash occurs, you can later recover the original master file contents. Assuming that only read operations count as record accesses, what is the maximum number of records in the transactions file you should accept before suggesting that the on-line file update method would be more efficient?

    b) How would your answer to (a) change if a copy of the original master file were not required?

    c) How would your answer to (a) change if the transactions file was initially sorted?

    d) Explain why your answer to (a) would change if a large number of transactions were insertion operations. You do not need to provide any math for (d).

  4. A sorted master file consisting of 1000 records and an unsorted differential file consisting of 128 records are stored on disk. You have just received a request to locate the record with key K. Assuming that the record is in the master file, but you first check the differential file, approximately how many disk accesses in total will it take before you find the desired record?

  5. How would this change if you were using a Bloom Filter?

  6. Why could you not use binary search to locate entries in the original funds file supplied for the first assignment?

  7. Using a Bloom filter, why can we not be sure that a record with a certain key exists in a differential file, even if the appropriate bits in the bitmap have been set to 1?

  8. What is wrong with using the following two functions as hashing functions to set bits in a Bloom Filter:
    	h1 = K mod M 
    	h2 = (K+3) mod M 
    
    where K is the record key and M is the size of the bitmap.

  9. Can you suggest a better function for h2?