Open addressing
- is another approach to solve collisions
- no chaining; instead, all items stored in table
- one item per slot ⇒ $m ≥ n$
- hash function will specify the order of slots to probe (try) for a key (for insert/search/delete), not just one slot. (since when collision occurs, you need to have further trials to find an spare slot)
- in math notation: to design a function $h$, with the property that for all $k ∈ U$:
- the universe of keys $\times$ trial count, mapps to the slot in table
$$h: U \times \{0,1,…,m-1\} \to \{0,1,…,m-1\}$$
- the universe of keys $\times$ trial count, mapps to the slot in table
- $\langle h(k, 0), h(k, 1), . . . , h(k, m − 1)\rangle$ is a permutation of 0, 1, …, m − 1. That means if keep trying h(k, i) for increasing i, we will eventually hit all slots of the table.
Operations are as follows:
- Insert(k,v): Keep probing until an empty slot is found. Insert item into that slot.
- Search(k): As long as the slots you encounter by probing are occupied by keys = k, keep probing until you either encounter k or find an empty slot—return success or failure respectively.
- Deleting Items: rather than setting none in the slot of the deleted item, replace it with special flag: “DeleteMe”, which Insert treats as None but Search doesn’t
Probing strategies
Linear Probing
$h(k, i) = (h’(k) +i)\; mod\; m$ where h’(k) is ordinary hash function
But have a problem: clustering—cluster: consecutive group of occupied slots as clusters become longer, it gets more likely to grow further.
Double Hashing
$h(k, i) =(h_1(k) +i·h_2(k)) \; mod\; m$ where $h_1(k)$ and $h_2(k)$ are two ordinary hash functions.
- actually hit all slots (permutation) if $h_2(k)$ is relatively prime to m for all k
Uniform Hashing Assumption
- is not Simple Uniform Hashing Assumption
- Each key is equally likely to have any one of the m! permutations as its probe sequence(not really true, but double hashing can come close)
Open Addressing vs. Chaining
- Open Addressing: better cache performance (better memory usage, no pointers needed)
- Chaining: less sensitive to hash functions (OA requires extra care to avoid clustering) and the load factor α (OA degrades past 70% or so and in any event cannot support values larger than 1)
Cryptographic Hashing
A demo of different probng tsrategies can be found here