BBHash: some ideas

BBHash [cite/t:@bbhash] uses multiple layers to create a minimal perfect hashing functions (MPFH), that hashes some input set into $[n]$.

(See also my note on PTHash [cite:@pthash].)

Simply said, it maps the $n$ elements into $[γ ⋅ n]$ using hashing function $h_0$. The $k_0$ elements that have collisions are mapped into $[γ ⋅ k_0]$ using $h_1$. Then, the $k_1$ elements with collisions are mapped into $[γ ⋅ k_1]$, and so on.

Possible speedup?

One bottleneck seems to be that this needs multiple layers of lookups each at different positions in memory. A possible idea:

Using $h_0$, map all items into buckets with $∼ 100$ elements each.
Hash each of those buckets using the original ‘recursive’ BBhash technique into a cacheline of $256$ bits.

This way, only two lookups should be needed (apart from the rank operation that follows).

I have no idea how the analysis works out though – maybe the recursive strategy works better on a larger scale and when hashing only so few items there isn’t much benefit to the BBhash approach in the first place.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bbhash.org

bbhash.org

BBHash: some ideas

Possible speedup?

Files

bbhash.org

Latest commit

History

bbhash.org

File metadata and controls

BBHash: some ideas

Possible speedup?