DSA

Bloom Filters in Java: Probabilistic Data Structures

Master Bloom Filters in Java. Learn how this memory-efficient probabilistic data structure checks for set membership with zero false negatives and controllable false positives.

Sachin Sarawgi·April 19, 2026·3 min read
#dsa#java#algorithms#bloom filter#probabilistic#hashing#interview preparation

A Bloom Filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set.

It is unique because it can give two types of answers:

  1. "Possibly in set": The element might be there, but it could be a "False Positive."
  2. "Definitely not in set": Guaranteed to be correct. There are zero False Negatives.

The Core Concept: Bit Arrays and Hashing

A Bloom Filter consists of:

  1. A Bit Array of size $m$, initialized to 0.
  2. $k$ different Hash Functions, each mapping an element to one of the $m$ array positions.

To add an element: Run it through all $k$ hash functions and set the bits at those positions to 1. To query an element: Run it through all $k$ hash functions. If all bits at those positions are 1, the element is "possibly in set." If any bit is 0, the element is "definitely not in set."


Bloom Filter Implementation in Java (Simple)

import java.util.BitSet;
import java.util.function.Function;

public class BloomFilter<T> {
    private BitSet bitSet;
    private int bitSetSize;
    private int numHashFunctions;
    private Function<T, Integer>[] hashFunctions;

    public BloomFilter(int m, int k) {
        this.bitSetSize = m;
        this.numHashFunctions = k;
        this.bitSet = new BitSet(m);
    }

    public void add(T element) {
        for (int i = 0; i < numHashFunctions; i++) {
            int hash = Math.abs((element.hashCode() + i * 31) % bitSetSize);
            bitSet.set(hash);
        }
    }

    public boolean mightContain(T element) {
        for (int i = 0; i < numHashFunctions; i++) {
            int hash = Math.abs((element.hashCode() + i * 31) % bitSetSize);
            if (!bitSet.get(hash)) {
                return false; // Definitely not there
            }
        }
        return true; // Might be there
    }
}

Why use Bloom Filters?

Feature Hash Set Bloom Filter
Space Complexity $O(N)$ - stores actual data $O(M)$ - very small bit array
Accuracy 100% Probabilistic (False Positives)
Deletion Supported Not supported (Standard version)
Performance Fast Extremely Fast

Real-World Applications

  1. Database Caching: Google BigTable and Apache Cassandra use Bloom Filters to avoid looking up non-existent rows on disk (preventing expensive I/O).
  2. Web Browsers: Google Chrome once used Bloom Filters to check if a URL was a known malicious site before performing a full server-side check.
  3. Networking: Routers use them to track blocked IP addresses or cache keys without using much memory.

Summary

Bloom Filters are a masterclass in trading accuracy for extreme efficiency. By accepting a small, controllable error rate, you can handle massive datasets that would never fit into a standard Hash Map. Understanding this trade-off is a key skill for senior backend engineers and system architects.

📚

Recommended Resources

Java Masterclass — UdemyBest Seller

Comprehensive Java course covering Java 17+, OOP, concurrency, and modern APIs.

View Course
Effective Java, 3rd EditionMust Read

Joshua Bloch's classic guide to writing clear, correct, and efficient Java code.

View on Amazon
Java Concurrency in Practice

The authoritative book on writing thread-safe, concurrent Java programs.

View on Amazon

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Found this useful? Share it: