A few days ago I have started working on a PHP library for algorithms and data structures. PHPAlgorithms is still in progress and thus has a early beta status. But today, I have pushed the first data structure which is nearly full implemented: HashMaps.
A HashMap is characterized among others by its key-value storage property. The value is assigned to the key where the key is unique in within the map whereas the value can exist several times (assigned to different keys). When adding a key-value pair to the map, existing values corresponding to the key are overriden. If the key is not already present, the key-value-pair is going to be added to the map.
HashMaps differ from usual arrays by their bucket structure. Each HashMap has a defined size which is also the maximum number of possible buckets. Each key-value-pair is added to a bucket for a simple reason: when searching for an key/value, it should be enough to iterate over a bucket and not the entire map.
This means that the bucket index depends on the key. The index calculation is therefore performance-critical for HashMaps: if so-called hash-collisions occure, meaning that multiple keys have the same hash value, HashMaps get very ineffective. The worst case is that each key-value-pair is assigned to the same bucket and the map is in fact an array. Therefore, the key has to be hashed in a effective way where ideally keys that belong together are in the same bucket. In doing so, operations like finding K-Nearest-Neighbors or average calculations get cost-effective from the perspective of time and memory complexity (Big O).
Let’s have a closer look to the add() method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | public function add($key, $value): bool { $arrayIndex = $this->getBucketIndex($key); $key = MapUtil::normalizeKey($key); if (isset($this->bucket[$arrayIndex])) { $list = $this->bucket[$arrayIndex]; } else { $list = new SinglyLinkedList(); } /* * the method checks the value if it is already * in the map or not. * * Notice that contains() looks for the value, not * key as below. */ if ($list->containsValue($value)) { return true; } $list->add($key, $value); $this->bucket[$arrayIndex] = $list; return true; } |
The method first calculates the bucket index and gets the head node at this index. Next, if the index is not in the array, a new node chain list is created. Otherwise, the list ist requested from the array. The value is not going to be added to the node chain when the value is already part of it. Otherwise, the value is added to the list and the bucket array at the hash position is updated.
CRUD Performance or Big O
HashMap uses a linked list in order to store values as node (chains). In the linked list class, inserting a node to the list hast O(1) in worst-case as the method adds the node to the top of the node chain (head).
1 2 3 4 5 6 | ... $node = new Node(); $node->setKey($key); $node->setValue($value); $this->append($node); ... |
The node object is created by setting the key and value. The append() method iterates over the head until the end is reached and sets the next attribute of the last node to the node that should be added to the list:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | public function append(?Node $node): bool { if ($node === null) { return false; } $head = $this->getHead(); if ($head === null) { $this->setHead($node); return true; } while ($head->getNext() !== null) { $head = $head->getNext(); } $head->setNext($node); return true; } |
Reading (Searching) and Updating in a hash map requires O(n) in worst-case where n is the bucket size at k’s hash position.
1 2 3 4 5 6 7 8 | $arrayIndex = $this->getBucketIndex($key); $head = $this->bucket[$arrayIndex]; while ($head !== null) { if ($head->getKey() === $key) { return $head; } $head = $head->getNext(); } |
Deleting from an hash map requires also O(n) as the whole bucket has to be searched for the key. If deletion should be done by value, the worst-case is O(m * n) where m is the number of buckets. Updating the next attribute of the searched node’s predecessor to its next attribute deletes the node from the chain.
It becomes clear that the way how the hash is calculated is very critical. If the hash value for every key is the same, chaining has no benefit and the hash map acts like an array (even worse: an array that calculates hashes and chains objects).
PHPAlgorithms on GitHub
PHPAlgorithms/HashMap is licensed under MIT and is available on GitHub. The library is in a early beta status. So use it on your own risk! and make a pull request if you find any bugs 🙂