Introduction to Bloom Filter

- September 21, 2025

A Bloom filter is a probabilistic data structure used to test whether an element is possibly in a set or definitely not in a set.

It’s designed to be memory-efficient and very fast, but it allows false positives (saying an element might be in the set when it’s not) while guaranteeing no false negatives (if it says something is not in the set, it really isn’t).

How it works:

You start with a bit array (all bits set to 0).
You have k different hash functions.
To add an item:
- Compute its k hashes.
- Set the corresponding k positions in the bit array to 1.
To check if an item is in the set:
- Compute its k hashes.
- If all those positions are 1 → item might be in the set (possible false positive).
- If any position is 0 → item is definitely not in the set.

Example:

Suppose you insert "cat".
Hash functions map "cat" to positions [3, 7, 12], so you set those bits.
Later, you check "dog".
Its hashes are [3, 12, 20]. Since positions 3 and 12 are already set (from "cat"), and 20 is also set (maybe by "fish"), the filter might say "dog" is in the set—even if you never added it.

Key properties:

Fast lookups: O(k) time (constant if k is fixed).
Space-efficient: Much smaller than storing full items.
No deletions (in the basic version, though variants like counting Bloom filters exist).
False positives possible, false negatives impossible.

Common use cases:

Web browsers: Checking if a URL is in a blacklist.
Databases: Quickly filtering out keys that aren’t present.
Networking: Cache membership checks, routing, etc.
Big data systems (e.g., Hadoop, Cassandra, Bitcoin).

Recommended articles:
https://www.baeldung.com/cs/bloom-filter

Search This Blog

devblogvn

Introduction to Bloom Filter

How it works:

Example:

Key properties:

Common use cases:

Comments

Post a Comment

Popular posts from this blog

REST vs RPC vs GraphQL: Choosing the Right API Style

Fibonacci Agile Estimation

How to Add LICENSE.txt to Your .NET Project Using Azure Pipelines