Java: Hashmap (Hashtable class is deprecated).
C#: Dictionary (Hashtable class is deprecated too).
/!\ A hash table does not guarantee that the order will remain constant over time.
Hashmap in Java
final Map<String, String> myLookupTable = new HashMap();
myLookupTable.put("key", "value");
myLookupTable.get("key");
myLookupTable.getOrDefault(key, defaultValueIfDoesNotExist);
myLookupTable.isEmpty();
One of the efficient ways to iterate over a HashMap object in Java is to make use of foreach loop by treating each entry as a Map. Entry object.
final Map<String, String> map = new HashMap<String, String>();
for (Map.Entry<String, Integer> entry : map.entrySet()) {
String key = entry.getKey();
Integer value = entry.getValue();
}
Implementation
There are two main issues that we should tackle, in order to design an efficient hashmap data structure:
– hash function design. Metaphor: Similar to the postal system where we assign a postcode (hash) to each mail address (value).
– collision handling. Very large input > limited output. Collision happens when two different keys are mapped to the same address.
Hashtable Modulo + Array Implementation
A hash table is just an array behind the scene.
An array of a LinkedList. The linked lists are used to store collisions per index.
Note that it could also be a binary search tree as explained in CTCI page 89. This has the benefit to do an inline tree traversal and display the values sorted.
In order to minimize the potential collisions, it is advisable to use a prime number as the base of modulo, e.g. 2069.
K is the bumber of buckets
M is the number of unique keys
Time Complexity:
Space Complexity: O(K + M)
How is the hashmap implemented in Java? default size?
The default initial capacity of the HashMap is 2^4 (16). The capacity of the HashMap is doubled each time it reaches the threshold.
Lookup via Hashing and Hash Function
The hashcode (see hashing) does not replace the real index in a hashmap, it’s just a way to look up quickly.
A hash function generates a hash code and then you map the hash code to an index.
The below diagram is confusing because it does not depict clearly the step to map the hash to an index.
Why should we get an index and not use the integer hash as-is?
Usually, the output of such a hash algorithm is inside a range of some large number, typically much larger than the space you have in your table.
How to get the index?
We’re going to use the modulo operator to get a range of key values.
A simple formula: hash(value) % SIZE = index, where SIZE is a prime number.
This will throw a bunch of collisions as seen in the example below.
Size is also referred as buckets number.
We use something called modulus calculation. The mod A % B is a way to say “how many times can you fit B in A, if there is a rest that’s what I want!”.
Example:
17 mod 20 = 17 (indeed 17 goes 1 time into 20, that is 17 and the difference between 20 and 17 is 3).
But, but, 37 mod 20 = 17 too! That leads us to collision handling.
If collision, just store a LIST of values. Just put all the values there chained together.
When using the key to retrieve it, just lookup across all of them. Just a way to handle collision.
What If the size of the hash table changes?
If you need to grow or shrink your hash table, you need to recompute all the keys since you used the size in your formula.
Example
Consider an example of hash table of size 20.
The following items are to be stored:
– (key, value)
– (1,20)
– (2,70)
– (42,80)
– (4,25)
– (12,44)
– (14,32)
– (17,11)
– (13,78)
– (37,98)
Sr. No. | Key | Hash | Array Index |
---|---|---|---|
1 | 1 | 1 % 20 = 1 | 1 |
2 | 2 | 2 % 20 = 2 | 2 |
3 | 42 | 42 % 20 = 2 | 2 |
4 | 4 | 4 % 20 = 4 | 4 |
5 | 12 | 12 % 20 = 12 | 12 |
6 | 14 | 14 % 20 = 14 | 14 |
7 | 17 | 17 % 20 = 17 | 17 |
8 | 13 | 13 % 20 = 13 | 13 |
9 | 37 | 37 % 20 = 17 | 17 |
Notice the multiple collisions at indexes 2 and 17.
Index Collisions
Hash Collision, see https://mocki.co/hash-collision/
2 strings or object can have the same hashcode.
/!\ collision and chaining
[“Alex”] => get the hashcode for Alex => lookup the index => and get the value
Why using a prime number?
To reduce the collisions (see http://stackoverflow.com/a/1147232/1132522).
index[hash(input)%2]
would result in a collision for half of all possible hashes and a range of values.
index[hash(input)%prime]
results in a collision of <2 of all possible hashes.
Fixing the divisor to the table size also ensures that the number cannot be greater than the table.
Implementation C# backed by an array of LinkedList for collision handling, see this post on SO.
public class HashTable {
private static final int DEFAULT_SIZE = 100;
private Node[][] lookup;
public HashTable() {
lookup = new Node[DEFAULT_SIZE];
}
public void put(final String key, final String value) {
if (get(key) != null) {
// then it is an updae!
// TODO
}
final int index = convertToHash(key.getHash(), lookup.length);
// Save key + value for lookup in case of collision
final Node newNode = new Node(key, value);
final Node existingNode = lookup[index];
// Check if we have a collision and a node already exists there
if (existingNode != null) {
// newNode <=> existingNode
newNode.next = existingNode;
existingNode.previous = newNode;
}
lookup[index] = newNode;
}
public String get(String key) {
final int index = value.getHash() % lookup.length;
final Node head = lookup[index];
final String result = null;
// Walk all the collisions until you find the right key
while (head != null && result != null) {
if (head.key == key) {
result = head.value;
}
head = head.next;
}
return result;
}
public void remove(final String key) {
// TODO
}
private int convertHashToIndex(final int hash, final int length) {
return hash % length;
}
}
public class Node {
public Node next;
public Node previoud;
public String key;
public String value;
}
Re-hashing
We need to re-hash as we change the capacity!
https://www.baeldung.com/java-hashmap-load-factor
Time Complexity
We usually assume that we have a good hash function for an interview.
Getting and setting from a hashtable is constant time (O(1)).
We could also have a linear time (O(N)) in the very worst case for a terrible hashtable (with lots of collisions, etc).
See Collison and Linear Probing and Collison resolution: Chaining.
In the real world we will make sure we have a pretty good HashTable implementation.
Hashtable collisions
Hashtables – what is a collision?
How to handle them?
Is there any effect on the Deletion of an element?
It may happen that the hashing technique is used to create an already-used index of the array.
A collision happens when distinct keys produce the same hashCode() value. In this case, typically there are buckets at the mapped index.
Solution 1: Open addressing
In such a case, we can search for the next empty location in the array by looking into the next cell until we find an empty cell. This technique is called linear probing.
Linear probing, while simple, is not a very good collision resolution strategy because it leads to clustering.
Solution 2: chaining
With chaining, a secondary data structure is utilized to hold any collisions.
Used by Dictionary class in C#.
A linked list of inserted records that collide with the same slot.
Alternative data structures can be used for chains instead of linked lists.
By using a self-balancing tree, for example, the theoretical worst-case time of a hash table can be brought down to O(log n) rather than O(n).
Sources
https://www.hackerrank.com/topics/hashing
https://www.tutorialspoint.com/java/java_hashmap_class.htm
https://www.dotnetperls.com/dictionary
https://www.tutorialspoint.com/data_structures_algorithms/hash_data_structure.htm
https://msdn.microsoft.com/en-us/library/ms379571%28v=vs.80%29.aspx?f=255&MSPPError=-2147217396
Hahs
From: https://www.hiredintech.com/classrooms/algorithm-design/lesson/43
https://www.tutorialspoint.com/data_structures_algorithms/hash_data_structure.htm
Amazing comparison with a library: http://stackoverflow.com/a/730813/1132522
Leave a Reply
Want to join the discussion?Feel free to contribute!