Saturday, January 17, 2015

How HashMap works in Java.

                 HashMap in Java works on hashing principle. It is a data structure which allows us to store object and retrieve it in constant time O(1) , provided we know the key. In hashing, hash functions are used to link key and value in HashMap. Objects are stored by calling put(key, value) method of HashMap and retrieved by calling get(key) method. When we call put method, hashcode() method of key object is called so that hash function of map can find a bucket location to store value object, which is actually index of internal array, known as table. HashMap internally store mapping in form of Map.Entry object which contains both key and value object. When you want to retrieve the object, you call get() method and again pass key object. This time again key object generate same hash code (it's mandatory for it to do so to retrieve object and that's why HashMap keys are immutable e.g. String) and we end up at same bucket location. If there is only one object then it is returned and that's your value object which you have stored earlier. Things get little tricky when collisions occurs. Since internal array of HashMap is of fixed size, and if you keep storing objects, at some point of time hash function will return same bucket location for two different keys, this is called collision in HashMap. In this case, a linked list is formed at that bucket location and new entry is stored as next node. If we try to retrieve object from this linked list, we need an extra check to search correct value, this is done by equals() method. Since each node contains an entry, HashMap keep comparing entry's key object with passed key using equals() and when it return true, Map returns corresponding value. Since searching in lined list is O(n) operation, in worst case hash collision reduce a map to linked list. This issue is recently addressed in Java 8 by replacing linked list to tree to search in O(logN) time. By the way, you can easily verify how HashMap work by looking at code of HashMap.java in your Eclipse IDE.


1) How does get and put method works in hashmap?

When we pass Key and Value object  to put() method on Java HashMap, HashMap implementation calls hashCode method on Key object and applies returned hashcode into its own hashing function to find a bucket location for storing Entry object, important point to mention is that HashMap in Java stores both key and value object as Map.Entry in bucket which is essential to understand the retrieving logic.

a) The underlying table
transient Entry<K,V>[] table;
table = new Entry[capacity]; // Inside constructor. Constructs an empty HashMap with the specified initial //capacity and load factor.

b) Entry Class inside HashMap. It is a static inner class.

 static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        // Getter + setters here
    }

c) Put mehod
 public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

    static int indexFor(int h, int length) {
        return h & (length-1);
    }

 void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

 d) Get method

  public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }


 final Entry<K,V> getEntry(Object key) {
        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }


e) Null is handled separately without using hascode and equals method

  private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }


 private V getForNullKey() {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }

2) When two different objects have same hashcode?

Remember that two unequal object in Java can have same hash code. Since hashcode is same, bucket location would be same and collision will occur in HashMap, Since HashMap use LinkedList to store object, this entry (object of Map.Entry comprise key and value) will be stored in LinkedList. But note that  Java HashMap  doesn't append the new element at tail instead it append new element at head to avoid tail traversing.
           When we call get() method  for such object,then HashMap uses Key Object's hashcode to find out bucket location and retrieves Value object but then you need to remember that there are two Value objects are stored in same bucket , so you have to traverse the LinkedList until we find the value object .Then you ask how do you identify value object because you don't  have value object to compare.
But remember  HashMap  stores both Key and Value in LinkedList node or as Map.Entry. So we will call key.equals() method to identify correct node in LinkedList and return associated value object for that key in Java HashMap .
Please refer to the code above for more clear picture.

3) Sample code to test collision.

class MyNumber
{
    Integer num;
    int id;
   
    public MyNumber(Integer num, int id) {
        this.num = num;
        this.id = id;
    }
   
    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((num == null) ? 0 : num % 10);
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        MyNumber other = (MyNumber) obj;
        if (id != other.id)
            return false;
        if (num == null) {
            if (other.num != null)
                return false;
        } else if (!num.equals(other.num))
            return false;
        return true;
    }   
}
public static <K> void main(String[] args) {

Map<MyNumber, String> map = new HashMap<MyNumber, String>();

        System.out.println("size: " + map.size());
        map.put(new MyNumber(1, 1), "One");
        map.put(new MyNumber(11, 11), "Eleven");
        map.put(new MyNumber(20, 20), "Twenty"); // Place debug point here
        System.out.println("size: " + map.size());
}












4) HashMap Changes in JDK 1.7 and JDK 1.8

There is some performance improvement done on HashMap and ArrayList from JDK 1.7, which reduce memory consumption. Due to this empty Map are lazily initialized and will cost you less memory. Earlier, when you create HashMap e.g. new HashMap() it automatically creates array of default length e.g. 16. After some research, Java team founds that most of this Map are temporary and never use that many elements, and only end up wasting memory. Also, From JDK 1.8 onwards HashMap has introduced an improved strategy to deal with high collision rate. Since a poor hash function e.g. which always return location of same bucket, can turn a HashMap into linked list, i.e. converting get() method to perform in O(n) instead of O(1) and someone can take advantage of this fact, Java now internally replace linked list to a binary true once certain threshold is breached. This ensures performance or order O(log(n)) even in worst case where hash function is not distributing keys properly.

5) Default initial capacity, Max capacity  and Default Load Factor for hashmap.

    static final int DEFAULT_INITIAL_CAPACITY = 16;
    static final int MAXIMUM_CAPACITY = 1 << 30;
    static final float DEFAULT_LOAD_FACTOR = 0.75f;

    threshold = capacity * loadfactor;

 
Note: when hashmap size reaches the threshold value, rehashing of hashmap takes place and size of hashmap table gets doubled.


2 comments:

  1. Please Comment if you have any doubts...

    ReplyDelete
  2. Thanks for the details inside view about HashMap, the topic i was explaining you that day was hash collision.

    ReplyDelete