Introduction to Bloom Filter
A Bloom filter is a probabilistic data structure used to test whether an element is possibly in a set or definitely not in a set.
It’s designed to be memory-efficient and very fast, but it allows false positives (saying an element might be in the set when it’s not) while guaranteeing no false negatives (if it says something is not in the set, it really isn’t).
How it works:
-
You start with a bit array (all bits set to 0).
-
You have k different hash functions.
-
To add an item:
-
Compute its k hashes.
-
Set the corresponding k positions in the bit array to 1.
-
-
To check if an item is in the set:
-
Compute its k hashes.
-
If all those positions are 1 → item might be in the set (possible false positive).
-
If any position is 0 → item is definitely not in the set.
-
Example:
-
Suppose you insert
"cat".
Hash functions map"cat"to positions [3, 7, 12], so you set those bits. -
Later, you check
"dog".
Its hashes are [3, 12, 20]. Since positions 3 and 12 are already set (from"cat"), and 20 is also set (maybe by"fish"), the filter might say"dog"is in the set—even if you never added it.
Key properties:
-
Fast lookups: O(k) time (constant if k is fixed).
-
Space-efficient: Much smaller than storing full items.
-
No deletions (in the basic version, though variants like counting Bloom filters exist).
-
False positives possible, false negatives impossible.
Common use cases:
-
Web browsers: Checking if a URL is in a blacklist.
-
Databases: Quickly filtering out keys that aren’t present.
-
Networking: Cache membership checks, routing, etc.
-
Big data systems (e.g., Hadoop, Cassandra, Bitcoin).
Recommended articles:
https://www.baeldung.com/cs/bloom-filter
Comments
Post a Comment