Writing a hash function in Java:
a practical guide to implementing hashCode()
If you started by reading this site's introduction to hash maps,
then you probably saw the example of the hash function of the String class. This function works by combining the values of
all characters making up the string. On this page, we'll look at some rough-and-ready
patterns for making a hash code function without going into too much of the theory.
Basic hash function guidelines
What we basically want to achieve with a hash function is as follows:
- we want to combine all data values that vary into the hash code;
- we want the bits of the data that vary most randomly to
affect the lower bits of the hash code;
- given a set of typical random keys, we want hash codes where as many bits
as possible have roughly 50-50 chance of being a 0 or 1 (in other
words are "as randomly distributed as possible"), especially in the lower bits;
- we want the hash function to execute quickly: in other words, it should
be a "couple of lines of code" written with a few simple arithmetic operations.
The desire to have hash codes that have a random distribution especially in the
lower bits isn't necessarily a universal requirement, but it is
desirable because of the way Java hash maps work1.
Here are some examples of data fields and how we might achieve the above goals. Note
that we'll just give the "raw" hash function here, and look separately at how to
actually build hash functions into a class.
Object fields | Example hash code | Comments |
Two ints, each fairly randomly distributed between 0 and a fairly large number2. | x ^ y | XORing two numbers with roughly random distribution results in another number still with roughly random distribution3, but which now depends on the two values. |
Two objects whose classes already have good hash functions (for example Strings or generally other standard library classes). | x.hashCode() ^ y.hashCode() | If the two classes already have good (randomly distributed) hash code functions, then you can treat the two hash codes as "randomly distributed integers" and XOR them. |
Two ints that have a biased distribution. | (x * p) ^ y Where p is a prime number4 and/or odd number close to a power of 2, such as 17 or 33. | Multiplying one of the numbers (or, put another way, shifting plus adding/subtracting from itself) will help to ensure that the "biased" bits of one number interact with the more "random" bits of the other, so that the bias is "ironed out" over more of the bits.
|
enums | Treat as "biased ints": (x.hashCode() * 17) ^ y.hashCode() | Enums typically have low values (favour the lower bits), so they should be combined with a multiplication to spread them across a wider bit range of the hash code.
|
Two doubles or floats. | Wrap the primitive in a Float or Double object then call its hashCode() method. new Float(x).hashCode() ^ new Double(y).hashCode() (new Double(x).hashCode() >> 13) ^ new Double(y).hashCode() | For efficiency (but the JIT compiler may remove the need) you could directly copy the code from Double.hashCode() or Float.hashCode() directly into your hash function. See the hashCode() method of Point2D for an example of this.
Hash codes of floats and doubles, in particular those representing powers of 2 (including negative powers of 2: 1/2, 1/4 etc), typically have more variation in the upper bits than in the lower ones. The HashMap's secondary hash code alleviates this problem somewhat, but it may be worth combining the hash codes by shifting one of them right
by several bits so that more variation ends up in the lower bits.
|
If you're really stuck...
If you're really stuck on what to use as a hash function, a last resort
(that is actually not as bad as it sounds) is to append all of the variables in
string format and then use String.hashCode()
public int hashCode() {
int hc = cachedHashCode;
if (hc == 0) {
String varstr = var1 + var2 + var3;
hc = varstr.hashCode();
}
return hc;
}
Because creating the string is a little bit of an overhead, you'll probably
want to cache the hash code once calculated, as in this example.
Making classes hash map compatible
Once you have decided on a hash function, you need to actually "plug it in" to your
Java class. As discussed on the next page, you essentially do this by
overriding hashCode() and equals()
in the class in question.
1. Since Java 1.4, the HashMap and related classes uses a table that always
has a power of two as the number of buckets. In other words, it will always take
the x lowest bits of a given hash code to determine the bucket number.
The HashMap implementation also uses an additional hash function to try
and "spread" the randomness of bits in the hash function. In other words, a hash
function which has some random bits, even if not the low-order bits, will
work to some extent, but still not as well as if it had random low-order bits.
Note that this means that the common guidance to "always use a prime number of
buckets" doesn't apply to Java HashMaps; even if you construct a HashMap
with a prime number, this capacity will be rounded up to the next-biggest power
of two.
2. By "fairly large", we mean large at least in proportion to the number of
integers we'll typically put in the map. Generally we mean at least as large as the
square of this number. So if we're going to put 200 numbers in the map, we
want them to be distributed fairly randomly between 0 and 200 x 200 = 400000 at the very least.
3. At each bit of the two numbers to combine, a 0 is output if the two bits are equal, else a 1. In other words, in 50% of the combinations, a 1 will be output. So if the two input bits each have a roughly 50-50 chance of being 0 or 1, then so too will the output bit.
4. Actually choosing which number to use as p can be a black art, but in many cases those mentioned will work fairly well. If you know something more precise about the number distributions, you may be able to choose a better operation to average out the bias. For example, if you know that "x is always a fairly random multiple of 8 and y is always fairly random", then you could use (x >> 3) ^ y (effectively divide x by 8 before XORing with y).
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.