Post

C++ std -- unordered set/map

std::unordered_set

Both unordered_set and unordered_map use an internal __hash_table object, so it suffices to only talk about how hash table works in C++.

The basic structure looks like diagram below. It contains two parts: a single-linked list of nodes and an array of bucket head.

1
2
3
4
node1 --> node2 --> node3 --> ... --> node_n --> nullptr
  |                  |
  |                  |
[bucket1,           bucket2, ....           ]

The __bucket_list is simply an raw array. Each array element points to a node in the listed list. So the nodes between bucket_i and bucket_{i+1} belong to bucket_i.

Each node in the linked list contains a value. For unordered_set, its type is class Key. For unordered_map, it is pair<class Key, class Value>. It is unlike Java which uses HashMap to simulate HashSet, which needs an unnecessary value field. C++ is more space efficient.

There are two hash concepts in __hash_table. One is node hash. The node value type should define a function __hash() which is used to get the node hash. Another is bucket hash std::__constrain_hash(node_hash, bucket_count). It takes node’s hash and total number of buckets as input and return the bucket id this node belows to. In the simplest case, it is node_hash % bucket_count.

How does ::erase work?

Function erase calls remove. Basically,

  1. find the previous node of the node to be erased in the linked list.
  2. Deal with special cases: prev node is in the previous bucket, or the next node is in the next bucket. In these edge cases, after deleting the current node, the bucket becomes empty, so we need to delete the bucket as well.
  3. Delete the node: prev->next = curr->next; curr->next = null.

Note, the function returns a __node_holder.

1
    return __node_holder(__cn->__upcast(), _Dp(__node_alloc(), true));

So, the deleted node will not be destructed immediately. It is up to the calling site how to deal with this node. Usually, the calling site does nothing, and _Dp(__node_alloc()) destructs this node immediately once it goes out of scope.

std::unordered_map

operator[]

The specification says the value_type must be DefaultConstructible. Let’s say what happens if it is not. For a simple program as below

1
2
3
4
5
6
7
8
9
10
struct A {
  int a;
  A(int a_): a(a_) {}
};

int main() {
  unordered_map<int, A> u;
  auto x = u[0];
  cout << "value:" << x.a << std::endl;
}

The compilation errors is

1
2
error: no matching constructor for initialization of 'A'
      second(_VSTD::forward<_Args2>(_VSTD::get<_I2>(__second_args))...)

Let’s take a deeper look. The definition for operator[] is here. If the key is not found in the table, then it calls __construct_node_with_key to construct a k-v pair to insert to __table. The construction process uses allocator_traits::construct without constructor arguments. Namely, it calls the default constructor of the value type.

iterator

Sometimes, I mixed different iterators. Some iterators can do it+2, some can only do it++. So how does the iterator of unordered_map work?

According to cppreference, the iterator of unordered_map is a LegacyForwardIterator. It has only two operations: operator++ and operator++(int). See source code.

However, the <iterator> header defines a function std::advance(InputIt& it, Distance n). How does this work? The source code is here. Basically, depending on the iterator type, it can choose to directly add n to the base iterator, or use a loop to advance the iterator.

Btw, what type of iterator do other containers have?

  • LegacyForwardIterator: unordered_set, unordered_map,
  • LegacyBidirectionalIterator: set, map
  • LegacyRandomAccessIterator: vector

For set, map. It uses a tree underneath, the iterator definition is here. It defines both ++ and -- operators.

This post is licensed under CC BY 4.0 by the author.