AI 24/25 Project Software
Documentation for the AI 24/25 course programming project software
Loading...
Searching...
No Matches
hash.h
1#ifndef UTILS_HASH_H
2#define UTILS_HASH_H
3
4#include <cassert>
5#include <cstddef>
6#include <cstdint>
7#include <unordered_map>
8#include <unordered_set>
9#include <utility>
10#include <vector>
11
12namespace utils {
13/*
14 We provide a family of hash functions that are supposedly higher
15 quality than what is guaranteed by the standard library. Changing a
16 single bit in the input should typically change around half of the
17 bits in the final hash value. The hash functions we previously used
18 turned out to cluster when we tried hash tables with open addressing
19 for state registries.
20
21 The low-level hash functions are based on lookup3.c by Bob Jenkins,
22 May 2006, public domain. See http://www.burtleburtle.net/bob/c/lookup3.c.
23
24 To hash an object x, it is represented as a sequence of 32-bit
25 pieces (called the "code" for x, written code(x) in the following)
26 that are "fed" to the main hashing function (implemented in class
27 HashState) one by one. This allows a compositional approach to
28 hashing. For example, the code for a pair p is the concatenation of
29 code(p.first) and code(p.second).
30
31 A simpler compositional approach to hashing would first hash the
32 components of an object and then combine the hash values, and this
33 is what a previous version of our code did. The approach with an
34 explicit HashState object is stronger because the internal hash
35 state is larger (96 bits) than the final hash value and hence pairs
36 <x, y> and <x', y> where x and x' have the same hash value don't
37 necessarily collide. Another advantage of our approach is that we
38 can use the same overall hashing approach to generate hash values of
39 different types (e.g. 32-bit vs. 64-bit unsigned integers).
40
41 To extend the hashing mechanism to further classes, provide a
42 template specialization for the "feed" function. This must satisfy
43 the following requirements:
44
45 A) If x and y are objects of the same type, they should have code(x)
46 = code(y) iff x = y. That is, the code sequence should uniquely
47 describe each logically distinct object.
48
49 This requirement avoids unnecessary hash collisions. Of course,
50 there will still be "necessary" hash collisions because different
51 code sequences can collide in the low-level hash function.
52
53 B) To play nicely with composition, we additionally require that feed
54 implements a prefix code, i.e., for objects x != y of the same
55 type, code(x) must not be a prefix of code(y).
56
57 This requirement makes it much easier to define non-colliding
58 code sequences for composite objects such as pairs via
59 concatenation: if <a, b> != <a', b'>, then code(a) != code(a')
60 and code(b) != code(b') is *not* sufficient for concat(code(a),
61 code(b)) != concat(code(a'), code(b')). However, if we require a
62 prefix code, it *is* sufficient and the resulting code will again
63 be a prefix code.
64
65 Note that objects "of the same type" is meant as "logical type"
66 rather than C++ type.
67
68 For example, for objects such as vectors where we expect
69 different-length vectors to be combined in the same containers (=
70 have the same logical type), we include the length of the vector as
71 the first element in the code to ensure the prefix code property.
72
73 In contrast, for integer arrays encoding states, we *do not* include
74 the length as a prefix because states of different sizes are
75 considered to be different logical types and should not be mixed in
76 the same container, even though they are represented by the same C++
77 type.
78*/
79
80/*
81 Circular rotation (http://stackoverflow.com/a/31488147/224132).
82*/
83inline uint32_t rotate(uint32_t value, uint32_t offset)
84{
85 return (value << offset) | (value >> (32 - offset));
86}
87
88/*
89 Store the state of the hashing process.
90
91 This class can either be used by specializing the template function
92 utils::feed() (recommended, see below), or by working with it directly.
93*/
94class HashState {
95 std::uint32_t a, b, c;
96 int pending_values;
97
98 /*
99 Mix the three 32-bit values bijectively.
100
101 Any information in (a, b, c) before mix() is still in (a, b, c) after
102 mix().
103 */
104 void mix()
105 {
106 a -= c;
107 a ^= rotate(c, 4);
108 c += b;
109 b -= a;
110 b ^= rotate(a, 6);
111 a += c;
112 c -= b;
113 c ^= rotate(b, 8);
114 b += a;
115 a -= c;
116 a ^= rotate(c, 16);
117 c += b;
118 b -= a;
119 b ^= rotate(a, 19);
120 a += c;
121 c -= b;
122 c ^= rotate(b, 4);
123 b += a;
124 }
125
126 /*
127 Final mixing of the three 32-bit values (a, b, c) into c.
128
129 Triples of (a, b, c) differing in only a few bits will usually produce
130 values of c that look totally different.
131 */
132 void final_mix()
133 {
134 c ^= b;
135 c -= rotate(b, 14);
136 a ^= c;
137 a -= rotate(c, 11);
138 b ^= a;
139 b -= rotate(a, 25);
140 c ^= b;
141 c -= rotate(b, 16);
142 a ^= c;
143 a -= rotate(c, 4);
144 b ^= a;
145 b -= rotate(a, 14);
146 c ^= b;
147 c -= rotate(b, 24);
148 }
149
150public:
151 HashState()
152 : a(0xdeadbeef)
153 , b(a)
154 , c(a)
155 , pending_values(0)
156 {
157 }
158
159 void feed(std::uint32_t value)
160 {
161 assert(pending_values != -1);
162 if (pending_values == 3) {
163 mix();
164 pending_values = 0;
165 }
166 if (pending_values == 0) {
167 a += value;
168 ++pending_values;
169 } else if (pending_values == 1) {
170 b += value;
171 ++pending_values;
172 } else if (pending_values == 2) {
173 c += value;
174 ++pending_values;
175 }
176 }
177
178 /*
179 After calling this method, it is illegal to use the HashState object
180 further, i.e., make further calls to feed, get_hash32 or get_hash64. We
181 set pending_values = -1 to catch such illegal usage in debug mode.
182 */
183 std::uint32_t get_hash32()
184 {
185 assert(pending_values != -1);
186 if (pending_values) {
187 /*
188 pending_values == 0 can only hold if we never called
189 feed(), i.e., if we are hashing an empty sequence.
190 In this case we don't call final_mix for compatibility
191 with the original hash function by Jenkins.
192 */
193 final_mix();
194 }
195 pending_values = -1;
196 return c;
197 }
198
199 /*
200 See comment for get_hash32.
201 */
202 std::uint64_t get_hash64()
203 {
204 assert(pending_values != -1);
205 if (pending_values) {
206 // See comment for get_hash32.
207 final_mix();
208 }
209 pending_values = -1;
210 return (static_cast<std::uint64_t>(b) << 32) | c;
211 }
212};
213
214/*
215 These functions add a new object to an existing HashState object.
216
217 To add hashing support for a user type X, provide an override
218 for utils::feed(HashState &hash_state, const X &value).
219*/
220static_assert(
221 sizeof(int) == sizeof(std::uint32_t),
222 "int and uint32_t have different sizes");
223
224static_assert(
225 sizeof(unsigned int) == sizeof(std::uint32_t),
226 "unsigned int and uint32_t have different sizes");
227
228template <typename I>
229inline void feed_integer(HashState& hash_state, I value)
230{
231 if constexpr (sizeof(I) == sizeof(std::uint32_t)) {
232 hash_state.feed(static_cast<std::uint32_t>(value));
233 } else {
234 hash_state.feed(static_cast<std::uint32_t>(value));
235 value >>= 32;
236 hash_state.feed(static_cast<std::uint32_t>(value));
237 }
238}
239
240inline void feed(HashState& hash_state, int value)
241{
242 feed_integer(hash_state, value);
243}
244
245inline void feed(HashState& hash_state, long int value)
246{
247 feed_integer(hash_state, value);
248}
249
250inline void feed(HashState& hash_state, long long int value)
251{
252 feed_integer(hash_state, value);
253}
254
255inline void feed(HashState& hash_state, unsigned int value)
256{
257 feed_integer(hash_state, value);
258}
259
260inline void feed(HashState& hash_state, unsigned long int value)
261{
262 feed_integer(hash_state, value);
263}
264
265inline void feed(HashState& hash_state, unsigned long long int value)
266{
267 feed_integer(hash_state, value);
268}
269
270template <typename T>
271void feed(HashState& hash_state, const T* p)
272{
273 // This is wasteful in 32-bit mode, but we plan to discontinue 32-bit
274 // compiles anyway.
275 feed(hash_state, reinterpret_cast<std::uint64_t>(p));
276}
277
278template <typename T1, typename T2>
279void feed(HashState& hash_state, const std::pair<T1, T2>& p)
280{
281 feed(hash_state, p.first);
282 feed(hash_state, p.second);
283}
284
285template <typename T>
286void feed(HashState& hash_state, const std::vector<T>& vec)
287{
288 /*
289 Feed vector size to ensure that no two different vectors of the same type
290 have the same code prefix.
291
292 Using uint64_t is wasteful on 32-bit platforms but feeding a size_t breaks
293 the build on MacOS (see msg7812).
294 */
295 feed(hash_state, static_cast<uint64_t>(vec.size()));
296 for (const T& item : vec) {
297 feed(hash_state, item);
298 }
299}
300
301template <typename T>
302void feed_iterable(HashState& hash_state, T begin, T end)
303{
304 /*
305 Feed vector size to ensure that no two different vectors of the same type
306 have the same code prefix.
307
308 Using uint64_t is wasteful on 32-bit platforms but feeding a size_t breaks
309 the build on MacOS (see msg7812).
310 */
311 for (auto it = begin; it != end; ++it) {
312 feed(hash_state, *it);
313 }
314}
315
316template <typename... T>
317void feed(HashState& hash_state, const std::tuple<T...>& t)
318{
319 std::apply(
320 [&](auto&&... element) { ((feed(hash_state, element)), ...); },
321 t);
322}
323
324/*
325 Public hash functions.
326
327 get_hash() is used internally by the HashMap and HashSet classes below. In
328 more exotic use cases, such as implementing a custom hash table, you can also
329 use `get_hash32()`, `get_hash64()` and `get_hash()` directly.
330*/
331template <typename T>
332std::uint32_t get_hash32(const T& value)
333{
334 HashState hash_state;
335 feed(hash_state, value);
336 return hash_state.get_hash32();
337}
338
339template <typename T>
340std::uint64_t get_hash64(const T& value)
341{
342 HashState hash_state;
343 feed(hash_state, value);
344 return hash_state.get_hash64();
345}
346
347template <typename T>
348std::size_t get_hash(const T& value)
349{
350 return static_cast<std::size_t>(get_hash64(value));
351}
352
353// This struct should only be used by HashMap and HashSet below.
354template <typename T>
355struct Hash {
356 std::size_t operator()(const T& val) const { return get_hash(val); }
357};
358
359/*
360 Aliases for hash sets and hash maps in user code.
361
362 Use these aliases for hashing types T that don't have a standard std::hash<T>
363 specialization.
364
365 To hash types that are not supported out of the box, implement utils::feed.
366*/
367template <typename T1, typename T2>
368using HashMap = std::unordered_map<T1, T2, Hash<T1>>;
369
370template <typename T>
371using HashSet = std::unordered_set<T, Hash<T>>;
372} // namespace utils
373
374#endif