File Structures

Hashing

What problem that both linear and non-linear hashing suffer from is solved by double hashing?
- False alarms
- Primary clustering
- Secondary clustering
Given 5 buckets, numbered 0, 1, 2, 3, 4, each with a capacity for two records, show the result of inserting the records with the keys: 16, 7, 15, 34, 24, 25, 75, 14, 39, into the appropriate buckets, using the linear probing hashing function
```
	h(K) = [K + (i-1)] mod 5
```
On average, how many buckets had to be read to insert each record?
How would this number change (calculate it) if we used instead the non-linear probing hashing function h(K) = [K + (i-1)²] mod 5 to insert the same records?
Double hashing overcomes secondary clustering by using two different hashing functions as in: h(k) = (h1(k) + (i-1)h2(k)) mod N. Is it possible to overcome secondary clustering with only a single hashing function? If so, describe the method, and if not, explain why.
What is the advantage of using a direct file, in which the location of flight reservations is based on a hash value of the reservation number, for storing records in the term project?

Virtual Hashing

In the virtual hashing method with a total of M initial buckets, and an initial hashing function of h(K) = K mod M which of the following occurs when a bucket B overflows for the second time?
- Bucket 2B is allocated.
- Bucket B + 2M is allocated.
- Bucket B + 4M is allocated.
- Rehash some records using the hashing function h(K) = K mod 2M.
- Rehash some records using the hashing function h(K) = K mod 4M.
- Rehash some records using the hashing function h(K) = K mod 8M.

Table-assisted Hashing

A certain file is stored in buckets using the simple table-assisted hashing method described in class. There are four buckets, each of which can hold up to 3 records. The initial contents of this file and of the maxKey table are shown in the figure below. The hash function is h(K,i) = (K + i-1) mod 4. Show the state of the file and of the maxKey table following the insertion of a record with key 28 into the file.

Bucket maxKey contents

0 16 4 16 8

1 43 9 43 25

2 26 18 26

3 19 7 3 19
What is the maximum number of buckets that must be accessed during the search for a record?
When a record to be inserted hashes to a full bucket, where is the record placed?
Is it possible for the insertion of a record R to fail even though there is enough room in the file to accommodate a record? Explain.
In the table-assisted hashing method the attempted insertion of a record, R, into a full bucket may require that another record, R' be removed from that bucket and re-inserted elsewhere. Is it possible that this re-insertion of record R' would require record R to be re-inserted, and so on, thus resulting in an infinite loop? Explain.
Write a pseudo-code routine for the deletion of a record from a direct file, implemented using the table-assisted hashing method.

Bucket	maxKey	contents
0	16	4 16 8
1	43	9 43 25
2	26	18 26
3	19	7 3 19

Dynamic Hashing

For the dynamic hashing method, a PseudoRandom function, B(K,i), is required to guide the traveral of a binary tree towards the bucket containing the record with key K. The function outputs either 0 or 1, based on the values of the key, K, and the tree level, i.

What is wrong with using the following function for B(K,i)?

   int B(int K, int i) {
      int pos, bit; 

      pos = 1 << i; /* calculates 2^i */
      bit = (K & pos) >> i; /* obtains the ith bit of K */ 
      return (bit); 
   }

Write a good function for B(K,i).
A certain file is stored in buckets as determined using the method of dynamic hashing. The hash function is h(K) = K mod 4 and the first few bits of the B string associated with each key in the file are given in the table below. Each bucket can hold up to 2 records. Show the state of the directory and the file after the records with keys 17, 13, 16, 5, and 25 are inserted.

Key B

17 01101

13 00101

16 10110

5 11110

25 01001
What major issue can we ignore when when deleting from a direct file using the dynamic hashing method instead of simple open addressing?
What is the major similarity between table-assisted hashing and dynamic hashing? What advantage is there to using dynamic hashing?

Key	B
17	01101
13	00101
16	10110
5	11110
25	01001