14Mar
Why is important to override Object.hashCode() method

Java API defines several methods in java.lang.Object class with general contracts on which other classes rely on to be fulfilled by the implementations. These methods are: equals( ), hashCode( ), toString( ), clone( ). The Object class provides generic implementations for these methods, but any class you define should provide custom implementations that obey the general contracts. It is not mandatory to do it for all methods, but, depending on the context in which the class is used, some of them are mandatory to implement.

Coming back to hashCode( ), this method is very important when your class is used in conjunction with all hash-based collections: HashMap, HashSet, HashTable. According to the specification this method should behave as follows:

  • invoked multiple times on the same object, which did not alter its state, must return the same integer value; this is very important for mutable objects because during application execution, they can alter their state, which means the value returned by hashCode( ) should take this into account;
  • if two objects are equal (according to equals method) then calling hashCode( ) on each one of them must produce the same result;
  • if two objects are not equal (according to equals method) then calling hashCode( ) may or may not produce different values; however it is recommended to produce different values as it helps to improve performance.

From the specification above one thing is clear: whenever you override equals( ), override also hashCode( ).

Let’s consider the following simple example: we defined a class Point, which holds x and y coordinates of a point, and we want to associate a color to each point by using a Map<Point,String> (the color is represented as String):

class Point {
private int x;
private int y;
public Point( int x, int y ) {
this.x = x;
this.y = y;
}

public boolean equals( Object o ) {
if ( !(o instanceof Point )) {
return false;
}

Point p = ( Point )o;
return this.x == p.x && this.y == p.y;
}

public class ColorPoints {
public static void main( String[] args ) {
Map<Point, String> colorPoints = new HashMap<Point, String>( );
colorPoints.put( new Point(0,0), "#FF0000");
colorPoints.put( new Point(10,10), "#000000");
colorPoints.put( new Point(10,20), "#FFFFFF");
colorPoints.put( new Point(10,30), "#000000");

      String color = colorPoints.get( new Point(0,0));
System.out.println("Color is " + color );
}
}

When running the code, what will print to the console? You might be surprised but it will print: Color is null

Why is that? We are trying to get the color for a point we already have inserted into the map (0,0). What is the problem? The problem is in the way hash-based collections are handling the contained objects in order to improve the performance on access. Although equal (see equals( ) implementation), the two Point objects, are returning completely different values for hashCode( ) (since no hashCode( ) function was defined, the one from Object class is used). Based on the hash code values of the keys, HashMap implementation group the entries into “buckets", each “bucket" corresponding to a different hash code value. When the program tries to get the color of the Point(0,0) the object passed as parameter to the get method has a completely different hash code, which is not among those stored in the map, so there is no “bucket” associated to it. HashMap never calls equals( ) first, because it might be very expensive.

How can this be fixed? One dummy (but legal and unfortunately very common) solution is to just override the method hashCode( ) like this:

public int hashCode( ) {
return 1;
}

After re-running the code, it prints correctly Color is #FF0000. Now, since all Point objects have the same hash code, they are all in the same bucket. HashMap implementation identifies that bucket, and searches for the correct entry, by calling equals( ) method (in the bucket the entries are stored in a linked list). But this is damaging for the HashMap performance. The correct way is to have as many buckets as possible, in order to have as fewer entries as possible in them. Searching by hash codes is very fast but searching by calling equals( ) is much slower, and you should avoid making the collection making these calls to often.

An correct implementation of hashCode( ) should be based on the significant fields of the class (those fields that are also considered in the implementation of equals), in order to fulfill one of the requirements from the general contract: two equal objects should produce the same hash code.

@Override public int hashCode( ) {
int result = 11;
result = 17 * result + x;
result = 17 * result + y;
return result;
}

This will guarantee that we get distinct hash code values for different objects and will help HashMap to perform better.

Consider the following code for generating a large number of entries into the map:

Map<Point, String> colorPoints = new HashMap<Point, String>( );

long start = System.nanoTime();

for ( int i = 0; i < 360; i++ ) {
for (int j = 0; j < 360; j++ ) {
colorPoints.put( new Point(i,j), "#FF0000");
}
}

System.out.println("Time to load the data is:" + (System.nanoTime()- start) );

start = System.nanoTime();

String color = colorPoints.get( new Point(359,359));

System.out.println("Time to get the color for the point is:" + (System.nanoTime()- start) );

To conclude, keep in mind the following three items:Then run the program with the first dummy implementation of hashCode( ) and then run it with the second implementation. Notice the differences in the time spent not only to return a specific value but also to load the map with data?

  • override hashCode( ) anytime you have override equals( );
  • write an appropriate implementation of hashCode( ) which considers all significant fields of the object, in order to ensure a good performance for hash-based collections in which you will use the objects;
  • if the hash code value is expensive to calculate, store it into a instance variable, when the object is created or when is last modified.