MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup.1 2 3 It was created by Austin Appleby in 2008 4 and, as of 8 January 2016,5 is hosted on GitHub along with its test suite named SMHasher. It also exists in a number of variants,6 all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.
Unlike cryptographic hash functions, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.
Variants
MurmurHash1
The original MurmurHash was created as an attempt to make a faster function than Lookup3.7 Although successful, it had not been tested thoroughly and was not capable of providing 64-bit hashes as in Lookup3. Its design would be later built upon in MurmurHash2, combining a multiplicative hash (similar to the FowlerâNollâVo hash function) with an Xorshift.
MurmurHash2
MurmurHash2 8 yields a 32- or 64-bit value. It comes in multiple variants, including some that allow incremental hashing and aligned or neutral versions.
- MurmurHash2 (32-bit, x86)âThe original version; contains a flaw that weakens collision in some cases.9
- MurmurHash2A (32-bit, x86)âA fixed variant using MerkleâDamgĂ„rd construction. Slightly slower.
- CMurmurHash2A (32-bit, x86)âMurmurHash2A, but works incrementally.
- MurmurHashNeutral2 (32-bit, x86)âSlower, but endian- and alignment-neutral.
- MurmurHashAligned2 (32-bit, x86)âSlower, but does aligned reads (safer on some platforms).
- MurmurHash64A (64-bit, x64)âThe original 64-bit version. Optimized for 64-bit arithmetic.
- MurmurHash64B (64-bit, x86)âA 64-bit version optimized for 32-bit platforms. It is not a true 64-bit hash due to insufficient mixing of the stripes.10
The person who originally found the flaw in MurmurHash2 created an unofficial 160-bit version of MurmurHash2 called MurmurHash2_160.11
MurmurHash3
The current version, completed April 3, 2011, is MurmurHash3,12 13 which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher, a hash function test suite.
Implementations
The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python,14 C,15 Go,16 C#,13 17 D,18 Lua, Perl,19 Ruby,20 Rust,21 PHP,22 23 Common Lisp,24 Haskell,25 Elm,26 Clojure,27 Scala,28 Java,29 30 Erlang,31 Swift,32 Object Pascal,33 Kotlin,34 JavaScript,35 and OCaml.36
It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), nginx (ver 1.0.1),37 Rubinius,38 libmemcached (the C driver for Memcached),39 npm (nodejs package manager),40 maatkit,41 Hadoop,1 Kyoto Cabinet,42 Cassandra,43 44 Solr,45 vowpal wabbit,46 Elasticsearch,47 Guava,48 Kafka,49 and RedHat Virtual Data Optimizer (VDO).50
Vulnerabilities
Hash functions can be vulnerable to collision attacks, where a user can choose input data in such a way so as to intentionally cause hash collisions. Jean-Philippe Aumasson and Daniel J. Bernstein were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called HashDoS attacks.51 With the use of differential cryptanalysis, they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend using their own SipHash instead.
Algorithm
algorithm Murmur3_32 is
// Note: In this version, all arithmetic is performed with unsigned 32-bit integers.
// In the case of overflow, the result is reduced modulo 232.
input: key, len, seed
c1 â 0xcc9e2d51
c2 â 0x1b873593
r1 â 15
r2 â 13
m â 5
n â 0xe6546b64
hash â seed
for each fourByteChunk of key do
k â fourByteChunk
k â k Ă c1
k â k ROL r1
k â k Ă c2
hash â hash XOR k
hash â hash ROL r2
hash â (hash Ă m) + n
with any remainingBytesInKey do
remainingBytes â SwapToLittleEndian(remainingBytesInKey)
// Note: Endian swapping is only necessary on big-endian machines.
// The purpose is to place the meaningful digits towards the low end of the value,
// so that these digits have the greatest potential to affect the low range digits
// in the subsequent multiplication. Consider that locating the meaningful digits
// in the high range would produce a greater effect upon the high digits of the
// multiplication, and notably, that such high digits are likely to be discarded
// by the modulo arithmetic under overflow. We don't want that.
remainingBytes â remainingBytes Ă c1
remainingBytes â remainingBytes ROL r1
remainingBytes â remainingBytes Ă c2
hash â hash XOR remainingBytes
hash â hash XOR len
hash â hash XOR (hash >> 16)
hash â hash Ă 0x85ebca6b
hash â hash XOR (hash >> 13)
hash â hash Ă 0xc2b2ae35
hash â hash XOR (hash >> 16)
A sample C implementation follows (for little-endian CPUs):
static inline uint32_t murmur_32_scramble(uint32_t k) {
k *= 0xcc9e2d51;
k = (k << 15) | (k >> 17);
k *= 0x1b873593;
return k;
}
uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed)
{
uint32_t h = seed;
uint32_t k;
/* Read in groups of 4. */
for (size_t i = len >> 2; i; i--) {
// Here is a source of differing results across endiannesses.
// A swap here has no effects on hash properties though.
memcpy(&k, key, sizeof(uint32_t));
key += sizeof(uint32_t);
h ^= murmur_32_scramble(k);
h = (h << 13) | (h >> 19);
h = h * 5 + 0xe6546b64;
}
/* Read the rest. */
k = 0;
for (size_t i = len & 3; i; i--) {
k <<= 8;
k |= key[i - 1];
}
// A swap is *not* necessary here because the preceding loop already
// places the low bytes in the low places according to whatever endianness
// we use. Swaps only apply when the memory is copied in a chunk.
h ^= murmur_32_scramble(k);
/* Finalize. */
h ^= len;
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return h;
}
Test string | Seed value | Hash value (hexadecimal) | Hash value (decimal) |
---|---|---|---|
0x00000000 | 0x00000000 | 0 | |
0x00000001 | 0x514E28B7 | 1,364,076,727 | |
0xffffffff | 0x81F16F39 | 2,180,083,513 | |
test | 0x00000000 | 0xba6bd213 | 3,127,628,307 |
test | 0x9747b28c | 0x704b81dc | 1,883,996,636 |
Hello, world! | 0x00000000 | 0xc0363e43 | 3,224,780,355 |
Hello, world! | 0x9747b28c | 0x24884CBA | 612,912,314 |
The quick brown fox jumps over the lazy dog | 0x00000000 | 0x2e4ff723 | 776,992,547 |
The quick brown fox jumps over the lazy dog | 0x9747b28c | 0x2FA826CD | 799,549,133 |
See also
References
External links
Footnotes
-
âHadoop in Javaâ. Hbase.apache.org. 24 July 2011. Archived from the original on 12 January 2012. Retrieved 13 January 2012. â© â©2
-
âCouceiro et alâ (PDF) (in Portuguese). p. 14. Retrieved 13 January 2012. â©
-
Tanjent (tanjent) wrote,3 March 2008 13:31:00. âMurmurHash first announcementâ. Tanjent.livejournal.com. Retrieved 13 January 2012.
{{[cite web](https://en.wikipedia.org/wiki/Template:Cite_web "Template:Cite web")}}
: CS1 maint: numeric names: authors list () â© -
Austin Appleby. âSMHasherâ. Github.com. Retrieved 23 September 2024. â©
-
âMurmurHash2-160â. Simonhf.wordpress.com. 25 September 2010. Retrieved 13 January 2012. â©
-
âMurmurHash1â. GitHub. Retrieved 12 January 2019. â©
-
âMurmurHash2Flawâ. GitHub. Retrieved 15 January 2019. â©
-
âMurmurHash3 (see note on MurmurHash2_x86_64)â. GitHub. Retrieved 15 January 2019. â©
-
âMurmurHash2_160â. 25 September 2010. Retrieved 12 January 2019. â©
-
Horvath, Adam (10 August 2012). âMurMurHash3, an ultra fast hash algorithm for C# /.NETâ. â© â©2
-
âpyfasthash in Pythonâ. Retrieved 13 January 2012. â©
-
âC implementation in qLibc by Seungyoung Kimâ. GitHub. â©
-
Landman, Davy. âDavy Landman in C#â. Landman-code.blogspot.com. Retrieved 13 January 2012. â©
-
âstd.digest.murmurhash - D Programming Languageâ. dlang.org. Retrieved 5 November 2016. â©
-
âToru Maesaka in Perlâ. metacpan.org. Retrieved 13 January 2012. â©
-
Yuki Kurihara (16 October 2014). âDigest::MurmurHashâ. GitHub.com. Retrieved 18 March 2015. â©
-
âstusmall/murmur3â. GitHub. Retrieved 29 November 2015. â©
-
âPHP userland implementation of MurmurHash3â. github.com. Retrieved 18 December 2017. â©
-
âtarballs_are_good / murmurhash3â. Retrieved 7 February 2015. â©
-
âHaskellâ. Hackage.haskell.org. Retrieved 13 January 2012. â©
-
âElmâ. package.elm-lang.org. Retrieved 12 June 2019. â©
-
âMurmur3.java in Clojure source code on Githubâ. clojure.org. Retrieved 11 March 2014. â©
-
âScala standard library implementationâ. GitHub. 26 September 2014. â©
-
âMurmur3A and Murmur3F Java classes on Githubâ. greenrobot. Retrieved 5 November 2014. â©
-
âbipthelin/murmerl3â. GitHub. Retrieved 21 October 2015. â©
-
Daisuke T (7 February 2019). âMurmurHash-Swiftâ. GitHub.com. Retrieved 10 February 2019. â©
-
GitHub - Xor-el/HashLib4Pascal: Hashing for Modern Object Pascal â©
-
âgoncalossilva/kotlinx-murmurhashâ. GitHub.com. 10 December 2021. Retrieved 14 December 2021. â©
-
raycmorgan (owner). âJavascript implementation by Ray Morganâ. Gist.github.com. Retrieved 13 January 2012. â©
-
INRIA. âOCaml Sourceâ. GitHub.com. â©
-
ânginxâ. Retrieved 13 January 2012. â©
-
âRubiniusâ. GitHub. Retrieved 29 February 2012. â©
-
âlibMemcachedâ. libmemcached.org. Retrieved 21 October 2015. â©
-
âmaatkitâ. 24 March 2009. Retrieved 13 January 2012. â©
-
âKyoto Cabinet specificationâ. Fallabs.com. 4 March 2011. Retrieved 13 January 2012. â©
-
âPartitionersâ. apache.org. 15 November 2013. Retrieved 19 December 2013. â©
-
âIntroduction to Apache Cassandraâą + Whatâs New in 4.0 by Patrick McFadin. DataStax Presentsâ. YouTube. 10 April 2019. â©
-
âSolr MurmurHash2 Javadocâ. 31 August 2022. Archived from the original on 2 April 2015. â©
-
Virtual Data Optimizer source code â©