What, how and why is Bloom Filter
A lot of us heard about aa concept or a thing called bloom filters when we started to learn and discover new thongs and concepts about software development however it usually seems to be scary word or more precise a word which may to mean a complex thing so a lot of developers don’t give much attention to understand what exactly bloom filter is.
What is bloom filter?
Bloom filter is a probabilistic data structure which aim to answer the following question- Is this element present in the Set?
Bloom filter is a Bit Vector with all bits set to zero at beginning then we can set it to one (true) when we start using it.
Why is bloom filter needed?
A Bloom filters is a space-efficient data structure, but it does not store the actual items since it is just a bit vector, item whose membership needs to be tested, it is also hashed via the same hash functions. If all the bits are already set for this, the element may exist in the set.
If any bit is not set, the element fo sure is not in the set.
Bloom filters has many applications but mostly they are used in caches to test if we have cache hit for example it is used in CDNs to avoid caching items that are rarely searched.
How bloom filter works?
In order to add an item, it must be hashed first using multiple hash functions. Bits are set at the index of the hashes in the bit vector.
For example, let’s assume we need to add the
Mohamed
element using three efficient hash functions: H1(Mohamed
) = 20222 H2(Mohamed
) = 24500 H3(Mohamed
) = 24894We can take the mod of 10 for all these values to get an index within the bounds of the bit vector: 20222 % 10 = 2, 24500 % 10= 0, 24894 % 10 = 4
For an item whose membership needs to be tested, it is also hashed via the same hash functions above. If all the bits are already set for this, the element may exist in the set, if not then is not there in the set