Java – HashMap Internal implementation

I assume you are already aware of the basic usage of put(i.e to insert an element) and get(to fetch an element) methods of the HashMap, to understand HashMap’s internal implementation we will have a deeper look into these methods, meaning – how key-value pairs are stored and how value is fetched by key.

Do you know how HashMap works ?


If someone asks you or want’s to understand how does it work ?

Answer to this question will be very straight forward – On the principle of Hashing,
Where hashing function in involved in mapping Objects when putting the elements and also when getting the elements.

What do you mean by Hashing ?


Hashing is an important Data Structure which is designed to use a special function called the Hash function which is used to map a given value with a particular key for faster access of elements. In Java terms converting an Object into its integer form using hashcode() method. The hash function should return the same code each and every time the function is applied to the same object which means that two equal objects must produce the same hash code.

Whenever you insert new key-value pair using put() method, HashMap blindly doesn’t allocate index in the table[] array. Instead it calls hash function on the key. HashMap has its own hash function to calculate the hash code of the key. This function is implemented so that it overcomes default poorly implemented hashCode() methods. Below is implementation code of hash().

final int hash(Object k) {
  int h = 0;
  if (useAltHashing) {
  if (k instanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
      }
      h = hashSeed;
  }
  h ^= k.hashCode();

// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load //factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

After calculating the hash code of the key, it calls indexFor() method by passing the hash code of the key and length of the table[] array. This method returns the index in the table[] array for that particular key-value pair.

/* Returns index for hash code h. */
static int indexFor(int h, int length) {
    return h & (length-1);
}

Buckets

A bucket is one element of HashMap array. It is used to store nodes. Two or more nodes can have the same bucket. In that case link list structure is used to connect the nodes. Buckets are different in capacity. A relation between bucket and capacity is as follows:

capacity = number of buckets * load factor

A single bucket can have more than one nodes, it depends on hashCode() method. The better your hashCode() method is, the better your buckets will be utilized.

Entry class in HashMap

So what is the use of Entry[] in a HashMap ? The HashMap stores the Objects as Entry instances, not as key and value.

Hmm.. So why is it so important that one should know about it ?

HashMap has an inner class called an Entry Class which holds the key and values. There is also something called next, which you will get to know a bit later.

static class Entry<K,V> implements Map.Entry<K,V>
{
    final K key;
    V value;
    Entry<K,V> next;
    final int hash;
    ........
}

You know that the HashMap stores the Entry instances in an array and not directly as key-value pairs. In order to store a value, it uses a method – put() of the HashMap. Let’s go and see how it works.

How HashMap.put() method works internally

When we call put method, hashcode() method of the key object class is called so that hash function of the map can find a bucket location to store value object, which is actually an index of the internal array, known as the table.

HashMap internally stores mapping in the form of Map.Entry object which contains both key and value object.

Since the internal array of HashMap is of fixed size, and if you keep storing objects, at some point of time hash function will return same bucket location for two different keys, this is called collision in HashMap. In this case, a linked list is formed at that bucket location and a new entry is stored as next node

If we try to retrieve an object from this linked list, we need an extra check to search correct value, this is done by equals() method. Since each node contains an entry, HashMap keeps comparing entry’s key object with the passed key using equals() and when it returns true, Map returns the corresponding value.

Since searching inlined list is O(n) operation, in worst case hash collision reduce a map to linked list. This issue is recently addressed in Java 8 by replacing linked list to the tree to search in O(logN) time.

  transient Entry[] table;
  /**
  * Associates the specified value with the specified key in this map. If the
  * map previously contained a mapping for the key, the old value is
  * replaced.
  *
  * @param key
  *        	key with which the specified value is to be associated
  * @param value
  *        	value to be associated with the specified key
  * @return the previous value associated with <tt>key</tt>, or <tt>null</tt>
  *     	if there was no mapping for <tt>key</tt>. (A <tt>null</tt> return
  *     	can also indicate that the map previously associated
  *     	<tt>null</tt> with <tt>key</tt>.)
  */
  public V put(K key, V value) {
      if (key == null)
      return putForNullKey(value);
      int hash = hash(key.hashCode());
      int i = indexFor(hash, table.length);
      for (Entry<K , V> e = table[i]; e != null; e = e.next) {
          Object k;
          if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
              V oldValue = e.value;
              e.value = value;
              e.recordAccess(this);
              return oldValue;
          }
      }
   
      modCount++;
      addEntry(hash, key, value, i);
      return null;
  }

How HashMap.get() method works internally


  /**
  * Returns the value to which the specified key is mapped, or {@code null}
  * if this map contains no mapping for the key.
  *
  *
   
  * More formally, if this map contains a mapping from a key {@code k} to a
  * value {@code v} such that {@code (key==null ? k==null :
  * key.equals(k))}, then this method returns {@code v}; otherwise it returns
  * {@code null}. (There can be at most one such mapping.)
  *
  *
   
  * A return value of {@code null} does not <i>necessarily</i> indicate that
  * the map contains no mapping for the key; it's also possible that the map
  * explicitly maps the key to {@code null}. The {@link #containsKey
  * containsKey} operation may be used to distinguish these two cases.
  *
  * @see #put(Object, Object)
  */
  public V get(Object key) {
      if (key == null)
      return getForNullKey();
      int hash = hash(key.hashCode());
      for (Entry<K , V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {
          Object k;
          if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
              return e.value;
      }
      return null;
  }
  • Step 1 : First checks whether specified key is null or not. If the key is null, it calls getForNullKey()method.
  • Step 2 : If the key is not null, hash code of the specified key is calculated.
  • Step 3 : indexFor() method is used to find out the index of the specified key in the table[] array.
  • Step 4 : After getting index, it will iterate though linked list at that position and checks for the key using equals() method. If the key is found, it returns the value associated with it. otherwise returns null.

 Important Summary

  1. Time complexity is almost constant for put and get method until rehashing is not done.
  2. In case of collision, i.e. index of two or more nodes are same, nodes are joined by link list i.e. second node is referenced by first node and third by second and so on.
  3. If key given already exist in HashMap, the value is replaced with new value.
  4. hash code of null key is 0.
  5. When getting an object with its key, the linked list is traversed until the key matches or null is found on next field.
  6. Data structure to store entry objects is an array named table of type Entry.
  7. A particular index location in array is referred as bucket, because it can hold the first element of a linkedlist of entry objects.
  8. Key object’s hashCode() is required to calculate the index location of Entry object.
  9. Key object’s equals() method is used to maintain uniqueness of keys in map.
  10. Value object’s hashCode() and equals() method are not used in HashMap’s get() and put()methods.
  11. Hash code for null keys is always zero, and such entry object is always stored in zero index in Entry[].

Improvements in HashMap in Java 8

With Java 8 comes a new concept of conversion of buckets from Linked List to Balanced Trees in HashMaps.

Why ? Did it make anything better ? – Yes, it increases the performance and reduces the worst case time complexity from O(n) to O(log(n)).

Let’s go into a bit of Detail

In the Java 8 implementation of HashMap class three more parameters were introduced :

static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;

TREEIFY_THRESHOLD : In a Hashmap when any bucket or index is have more than 8 elements with the same hashcode, then automatically the implementation is changed from linkedlist to Balanced Tree.

UNTREEIFY_THRESHOLD : If elements are not more than 6 elements in the same bucket, then the balanced tree reverts back to linkedList.

MIN_TREEIFY_CAPACITY : So conversion to Balanced Tree is not gonna happen each time the elements in a bucket becomes more than 8, but it should also check if the total number of buckets in that hashmap is more than 64. Only in this case the bucket of that HashMap will transform to a Balanced Tree.

You may have come across a term called Bins in Javadocs or portals, By bins they mean the elements or nodes of TreeNodes.

So the Bins (elements or nodes) of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. So this is one of the reason why Tree is being used here. However, since the vast majority of bins in normal use are not overpopulated, checking for the existence of tree bins may be delayed in the course of table methods.

Tree bins (i.e. bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same “class C implements Comparable(C)“, type then their compareTo() method is used for ordering.

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes. And when they become too small (due to removal or resizing) they are converted back to plain bins (by using property : UNTREEIFY_THRESHOLD = 6 as explained above). In cases when the hashcodes are well-distributed, tree bins are rarely used.

Now, the Performance talk !

Best case : If we have very good hashcode algo implemented, meaning – hashcodes of all the keys are distinct, then the get and put operations run in O(1), which also means a constant time(independent of the number of keys in the map)

Worst case (before java8 ) : In the linkedlist scenario, if hashcodes of all the keys are same then the operations can take O(n) time, where n = number of keys in the map.

Worst case (Java 8) : In the Tree scenario, if hashcodes of all the keys are same then the operations can take O(log(n)) time, where n = number of keys in the map.

Summary

So here we are, we hope now you are confident and know how HashMap works in java.

We tried our best to simplify the explanation, if you feel any difference or you need help in understanding any such thing, then feel free to contact us through email and subscribe to ProgrammerToday.