Hash Tables

We can only do checks for equality with hash tables. We can't do range searches efficiently.

In hash indexes, search keys are distributed uniformly across "buckets" using a "hash function".

The way that hashing works is that we have a hash function h and it's going to map from our Universe to the set of buckets, so to these m different values. A key value x, and h(x) is going to be a value between 1 to m. The buckets are going to contain index entries, not data. The buckets are going to lead us to the actual data by having the key value along with a pointer to the actual data.

Collision is when different values from U have the same hash value. Bad things that can happen in hashing are that we might have a bad hash function in which lots of values hash to a few buckets - collision. Thus, we would like h to spread them out evenly.

To handle collisions, we use linked list - making a linked list of things that have the same hash value.

  1. We'll put multiple things in a bucket, so a bucket won't just hold a single item, it will hold a block full of items.

  2. When a bucket fills up because we have a lot of collisions, we can chain together some additional blocks within a bucket.

If we're unlucky, we're going to get long chains within one or a couple of buckets, and nothing or very little in other buckets. We are stuck with searching sequentially through these long chains. If we're lucky and we just have short chains, this is going to be very efficient.

Last updated