This subsection describes resize policies. It is organized as follows:
Hash-tables, as opposed to trees, do not naturally grow or shrink. It is necessary to specify policies to determine how and when a hash table should change its size.
In general, resize policies can be decomposed into (probably orthogonal) policies:
Size policies determine how a hash table changes size. These policies are simple, and there are relatively few sensible options. An exponential-size policy (with the initial size and growth factors both powers of 2) works well with a mask-based range-hashing function (see the Range-Hashing Policies subsection), and is the hard-wired policy used by Dinkumware [dinkumware_stl]. A prime-list based policy works well with a modulo-prime range hashing function (see the Range-Hashing Policies subsection), and is the hard-wired policy used by SGI's implementation [sgi_stl].
Trigger policies determine when a hash table changes size. Following is a description of two polcies: load-check policies, and a collision-check policies.
Load-check policies are straightforward. The user specifies two factors, αmin and αmax, and the hash table maintains the invariant that
αmin ≤ (number of stored elements) / (hash-table size) ≤ αmax (1) .
Collision-check policies work in the opposite direction of load-check policies. They focus on keeping the number of collisions moderate and hoping that the size of the table will not grow very large, instead of keeping a moderate load-factor and hoping that the number of collisions will be small. A maximal collision-check policy resizes when the shortest probe-sequence grows too large.
Consider Figure Balls and bins. Let the size of the hash table be denoted by m, the length of a probe sequence be denoted by k, and some load factor be denoted by α. We would like to calculate the minimal length of k, such that if there were α m elements in the hash table, a probe sequence of length k would be found with probability at most 1/m.
Denote the probability that a probe sequence of length k appears in bin i by pi, the length of the probe sequence of bin i by li, and assume uniform distribution. Then
P(l1 ≥ k) =
P(l1 ≥ α ( 1 + k / α - 1 ) ≤ (a)
e ^ ( - ( α ( k / α - 1 )2 ) /2 ) ,
where (a) follows from the Chernoff bound [motwani95random]. To calculate the probability that some bin contains a probe sequence greater than k, we note that the li are negatively-dependent [dubhashi98neg]. Let I(.) denote the indicator function. Then
P ( ∑ i = 1m I(li ≥ k) ≥ 1 ) =
P ( ∑ i = 1m I ( li ≥ k ) ≥ m p1 ( 1 + 1 / (m p1) - 1 ) ) ≤ (a)
e ^ ( ( - m p1 ( 1 / (m p1) - 1 ) 2 ) / 2 ) ,
where (a) follows from the fact that the Chernoff bound can be applied to negatively-dependent variables [dubhashi98neg]. Inserting (2) into (3), and equating with 1/m, we obtain
k ~ √ ( 2 α ln 2 m ln(m) ) ) .
The resize policies in the previous subsection are conceptually straightforward. The design of hash-based containers' size-related interface is complicated by some factors.
Clearly, the more of these points an interface addresses, the greater its flexibility but the lower its encapsulation and uniformity between associative containers.
This library attempts to address these types of problems by delegating all size-related functionality to policy classes. Hash-based containers are parameterized by a resize-policy class (among others), and derive publicly from the resize-policy class [alexandrescu01modern] E.g., a collision-chaining hash table is defined as follows:
cc_ht_map< class Key, class Data, ... class Resize_Policy ...> : public Resize_Policy
The containers themselves lack any functionality or public interface for manipulating sizes. A container object merely forwards events to its resize policy object and queries it for needed actions.
Figure Insert resize sequence diagram shows a (possible) sequence diagram of an insert operation. The user inserts an element; the hash table notifies its resize policy that a search has started (point A); in this case, a single collision is encountered - the table notifies its resize policy of this (point B); the container finally notifies its resize policy that the search has ended (point C); it then queries its resize policy whether a resize is needed, and if so, what is the new size (points D to G); following the resize, it notifies the policy that a resize has completed (point H); finally, the element is inserted, and the policy notified (point I).
This addresses, to some extent, the problems mentioned above:
The library contains a single class for instantiating a resize policy, pb_assoc contains a standard resize policy, hash_standard_resize_policy (the name is explained shortly). In terms of interface, it is parameterized by a boolean constant indicating whether its public interface supports queries of actual size and external resize operations (the inclusion and exclusion of these methods in the interface have obvious tradeoffs in terms of encapsulation and flexibility). ([alexandrescu01modern] shows many techniques for changing between alternative interfaces at compile time.)
As noted before, size and trigger policies are usually orthogonal. hash_standard_resize_policy is parameterized by size and trigger policies. For example, a collision-chaining hash table is typically be defined as follows:
cc_ht_map< key, data, ... hash_standard_resize_policy< some_trigger_policy, some_size_policy, ...> >
The sole function of hash_standard_resize_policy is to act as a standard delegator [gamma95designpatterns] for these policies.
Figures Standard resize policy trigger sequence diagram and Standard resize policy size sequence diagram show sequence diagrams illustrating the interaction between the standard resize policy and its trigger and size policies, respectively.
The library (currently) supports the following instantiations of size and trigger policies:
The trigger policies also support interfaces for changing their specifics which depend on compile time constants.
Figure Resize policy class diagram gives an overall picture of the resize-related classes. Container (which stands for any of the hash-based containers) is parameterized by Resize_Policy, from which it subclasses publicly [alexandrescu01modern]. This class is currently instantiated only by hash_standard_resize_policy. hash_standard_resize_policy itself is parameterized by Trigger_Policy and Size_Policy. Currently, Trigger_Policy is instantiated by hash_load_check_resize_trigger, or cc_hash_max_collision_check_resize_trigger; Size_Policy is instantiated by hash_exponential_size_policy, or hash_prime_size_policy.