A hash table is a data structure that lets you store and retrieve items extremely quickly, usually in constant time on average. The basic idea is simple: each item has a key, and a hash function turns that key into a number. That number is then reduced with modulo (hash(key) mod m
) to decide which bucket (or slot) in an underlying array the item belongs to. For example, the string "cat"
can be hashed into a number and placed in a specific bucket of the table.
Because the number of possible keys is much larger than the number of available buckets, collisions are unavoidable — different keys may end up in the same bucket. To handle this, hash tables use strategies such as separate chaining, where each bucket points to a small list of entries that share that slot, or open addressing, where items are stored directly in the array and collisions are resolved by probing forward until an empty slot is found.
Hash tables can operate in two modes: a hash set, which stores unique keys only (e.g., {"cat", "dog"}
), or a hash map, which stores key–value pairs (e.g., {"cat": 3, "dog": 5}
). To keep performance fast, hash tables also monitor their load factor (the ratio of items to buckets). If it grows too large, the table is resized and all items are rehashed into a bigger array.
The strength of hash tables is their speed: insertion, lookup, and deletion are typically O(1)
on average. They are widely used in applications that need fast membership checks and quick key-based lookups, such as dictionaries, compiler symbol tables, or web caches. However, they are not ideal when data must remain in sorted order or when range queries are required — in those cases, balanced trees or ordered maps are a better choice.
size / buckets
. Keep α low (≈0.75 for open addressing; 1–5+ okay for chaining) to maintain O(1) average time.insert/find/erase
are typically O(1); worst-case is O(n) if everything collides or a pathological hash is used.Operation | Separate chaining | Open addressing (linear) | Notes |
---|---|---|---|
insert/put(k,v) |
O(1) | O(1) | Rehash when α high; linear probing clusters as α→1 |
find/get(k) |
O(1) | O(1) | Expected probe length ~ 1/(1-α) for linear probing |
erase(k) |
O(1) | O(1) | Open addressing uses tombstones to maintain probe sequences |
rehash |
O(n) | O(n) | Resizes are occasional; amortized costs stay O(1) |
set
(hash set), dict
(hash map)S = {"cat", "dog"} # hash set (unique keys) M = {"cat": 3, "dog": 5} # hash map (key → value) "cat" in S # O(1) average M.get("cat", 0) # 3 M["dog"] = 6 # update
HashSet<E>
and HashMap<K,V>
import java.util.*; HashSet<String> S = new HashSet<>(); S.add("cat"); S.add("dog"); boolean has = S.contains("cat"); HashMap<String,Integer> M = new HashMap<>(); M.put("cat", 3); M.put("dog", 5); int v = M.getOrDefault("cat", 0);
std::unordered_set
and std::unordered_map
#include <unordered_set> #include <unordered_map> using namespace std; unordered_set<string> S; S.insert("cat"); S.insert("dog"); bool has = S.count("cat"); unordered_map<string,int> M; M["cat"] = 3; M["dog"] = 5; int v = M.at("cat");