Changeset 1b20da0 in mainline for kernel/generic/src/adt/cht.c
- Timestamp:
- 2018-02-28T17:52:03Z (7 years ago)
- Branches:
- lfn, master, serial, ticket/834-toolchain-update, topic/msim-upgrade, topic/simplify-dev-export
- Children:
- 3061bc1
- Parents:
- df6ded8
- git-author:
- Jiří Zárevúcky <zarevucky.jiri@…> (2018-02-28 17:26:03)
- git-committer:
- Jiří Zárevúcky <zarevucky.jiri@…> (2018-02-28 17:52:03)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
kernel/generic/src/adt/cht.c
rdf6ded8 r1b20da0 35 35 * @file 36 36 * @brief Scalable resizable concurrent lock-free hash table. 37 * 37 * 38 38 * CHT is a concurrent hash table that is scalable resizable and lock-free. 39 39 * resizable = the number of buckets of the table increases or decreases … … 48 48 * fraction of updates (insert/remove) increases. Other data structures 49 49 * significantly outperform CHT if the fraction of updates exceeds ~40%. 50 * 50 * 51 51 * CHT tolerates hardware exceptions and may be accessed from exception 52 52 * handlers as long as the underlying RCU implementation is exception safe. 53 * 53 * 54 54 * @par Caveats 55 * 55 * 56 56 * 0) Never assume an item is still in the table. 57 57 * The table may be accessed concurrently; therefore, other threads may … … 59 59 * in the table if cht_find() just returned it to you. Similarly, an 60 60 * item may have already been inserted by the time cht_find() returns NULL. 61 * 61 * 62 62 * 1) Always use RCU read locks when searching the table. 63 63 * Holding an RCU lock guarantees that an item found in the table remains 64 64 * valid (eg is not freed) even if the item was removed from the table 65 65 * in the meantime by another thread. 66 * 66 * 67 67 * 2) Never update values in place. 68 68 * Do not update items in the table in place, ie directly. The changes 69 69 * will not propagate to other readers (on other cpus) immediately or even 70 70 * correctly. Some readers may then encounter items that have only some 71 * of their fields changed or are completely inconsistent. 72 * 73 * Instead consider inserting an updated/changed copy of the item and 71 * of their fields changed or are completely inconsistent. 72 * 73 * Instead consider inserting an updated/changed copy of the item and 74 74 * removing the original item. Or contact the maintainer to provide 75 75 * you with a function that atomically replaces an item with a copy. 76 * 76 * 77 77 * 3) Use cht_insert_unique() instead of checking for duplicates with cht_find() 78 78 * The following code is prone to race conditions: … … 84 84 * @endcode 85 85 * See cht_insert_unique() on how to correctly fix this. 86 * 86 * 87 87 * 88 88 * @par Semantics 89 * 89 * 90 90 * Lazy readers = cht_find_lazy(), cht_find_next_lazy() 91 91 * Readers = lazy readers, cht_find(), cht_find_next() 92 * Updates = cht_insert(), cht_insert_unique(), cht_remove_key(), 92 * Updates = cht_insert(), cht_insert_unique(), cht_remove_key(), 93 93 * cht_remove_item() 94 * 95 * Readers (but not lazy readers) are guaranteed to see the effects 96 * of @e completed updates. In other words, if cht_find() is invoked 97 * after a cht_insert() @e returned eg on another cpu, cht_find() is 98 * guaranteed to see the inserted item. 99 * 94 * 95 * Readers (but not lazy readers) are guaranteed to see the effects 96 * of @e completed updates. In other words, if cht_find() is invoked 97 * after a cht_insert() @e returned eg on another cpu, cht_find() is 98 * guaranteed to see the inserted item. 99 * 100 100 * Similarly, updates see the effects of @e completed updates. For example, 101 * issuing cht_remove() after a cht_insert() for that key returned (even 101 * issuing cht_remove() after a cht_insert() for that key returned (even 102 102 * on another cpu) is guaranteed to remove the inserted item. 103 * 103 * 104 104 * Reading or updating the table concurrently with other updates 105 105 * always returns consistent data and never corrupts the table. 106 106 * However the effects of concurrent updates may or may not be 107 107 * visible to all other concurrent readers or updaters. Eg, not 108 * all readers may see that an item has already been inserted 109 * if cht_insert() has not yet returned. 110 * 108 * all readers may see that an item has already been inserted 109 * if cht_insert() has not yet returned. 110 * 111 111 * Lazy readers are guaranteed to eventually see updates but it 112 112 * may take some time (possibly milliseconds) after the update 113 113 * completes for the change to propagate to lazy readers on all 114 114 * cpus. 115 * 115 * 116 116 * @par Implementation 117 * 117 * 118 118 * Collisions in CHT are resolved with chaining. The number of buckets 119 * is always a power of 2. Each bucket is represented with a single linked 120 * lock-free list [1]. Items in buckets are sorted by their mixed hashes 121 * in ascending order. All buckets are terminated with a single global 122 * sentinel node whose mixed hash value is the greatest possible. 119 * is always a power of 2. Each bucket is represented with a single linked 120 * lock-free list [1]. Items in buckets are sorted by their mixed hashes 121 * in ascending order. All buckets are terminated with a single global 122 * sentinel node whose mixed hash value is the greatest possible. 123 123 * 124 124 * CHT with 2^k buckets uses the k most significant bits of a hash value … … 127 127 * hash function does not produce uniform hashes, hash values are 128 128 * mixed first so that the top bits of a mixed hash change even if hash 129 * values differ only in the least significant bits. The mixed hash 130 * values are cached in cht_link.hash (which is overwritten once the 129 * values differ only in the least significant bits. The mixed hash 130 * values are cached in cht_link.hash (which is overwritten once the 131 131 * item is scheduled for removal via rcu_call). 132 * 132 * 133 133 * A new item is inserted before all other existing items in the bucket 134 134 * with the same hash value as the newly inserted item (a la the original 135 * lock-free list [2]). Placing new items at the start of a same-hash 136 * sequence of items (eg duplicates) allows us to easily check for duplicates 137 * in cht_insert_unique(). The function can first check that there are 138 * no duplicates of the newly inserted item amongst the items with the 139 * same hash as the new item. If there were no duplicates the new item 140 * is linked before the same-hash items. Inserting a duplicate while 141 * the function is checking for duplicates is detected as a change of 142 * the link to the first checked same-hash item (and the search for 135 * lock-free list [2]). Placing new items at the start of a same-hash 136 * sequence of items (eg duplicates) allows us to easily check for duplicates 137 * in cht_insert_unique(). The function can first check that there are 138 * no duplicates of the newly inserted item amongst the items with the 139 * same hash as the new item. If there were no duplicates the new item 140 * is linked before the same-hash items. Inserting a duplicate while 141 * the function is checking for duplicates is detected as a change of 142 * the link to the first checked same-hash item (and the search for 143 143 * duplicates can be restarted). 144 * 144 * 145 145 * @par Table resize algorithm 146 * 146 * 147 147 * Table resize is based on [3] and [5]. First, a new bucket head array 148 148 * is allocated and initialized. Second, old bucket heads are moved 149 * to the new bucket head array with the protocol mentioned in [5]. 149 * to the new bucket head array with the protocol mentioned in [5]. 150 150 * At this point updaters start using the new bucket heads. Third, 151 151 * buckets are split (or joined) so that the table can make use of 152 152 * the extra bucket head slots in the new array (or stop wasting space 153 153 * with the unnecessary extra slots in the old array). Splitting 154 * or joining buckets employs a custom protocol. Last, the new array 154 * or joining buckets employs a custom protocol. Last, the new array 155 155 * replaces the original bucket array. 156 * 156 * 157 157 * A single background work item (of the system work queue) guides 158 158 * resizing of the table. If an updater detects that the bucket it … … 160 160 * or it needs to be split/joined), it helps out and completes the 161 161 * head move or the bucket split/join. 162 * 163 * The table always grows or shrinks by a factor of 2. Because items 164 * are assigned a bucket based on the top k bits of their mixed hash 165 * values, when growing the table each bucket is split into two buckets 166 * and all items of the two new buckets come from the single bucket in the 162 * 163 * The table always grows or shrinks by a factor of 2. Because items 164 * are assigned a bucket based on the top k bits of their mixed hash 165 * values, when growing the table each bucket is split into two buckets 166 * and all items of the two new buckets come from the single bucket in the 167 167 * original table. Ie items from separate buckets in the original table 168 * never intermix in the new buckets. Moreover 169 * since the buckets are sorted by their mixed hash values the items 170 * at the beginning of the old bucket will end up in the first new 168 * never intermix in the new buckets. Moreover 169 * since the buckets are sorted by their mixed hash values the items 170 * at the beginning of the old bucket will end up in the first new 171 171 * bucket while all the remaining items of the old bucket will end up 172 * in the second new bucket. Therefore, there is a single point where 173 * to split the linked list of the old bucket into two correctly sorted 172 * in the second new bucket. Therefore, there is a single point where 173 * to split the linked list of the old bucket into two correctly sorted 174 174 * linked lists of the new buckets: 175 175 * .- bucket split 176 * | 177 * <-- first --> v <-- second --> 176 * | 177 * <-- first --> v <-- second --> 178 178 * [old] --> [00b] -> [01b] -> [10b] -> [11b] -> sentinel 179 * ^ ^ 180 * [new0] -- -+ | 179 * ^ ^ 180 * [new0] -- -+ | 181 181 * [new1] -- -- -- -- -- -- -- -+ 182 * 182 * 183 183 * Resize in greater detail: 184 * 185 * a) First, a resizer (a single background system work queue item 186 * in charge of resizing the table) allocates and initializes a new 187 * bucket head array. New bucket heads are pointed to the sentinel 188 * and marked Invalid (in the lower order bits of the pointer to the 184 * 185 * a) First, a resizer (a single background system work queue item 186 * in charge of resizing the table) allocates and initializes a new 187 * bucket head array. New bucket heads are pointed to the sentinel 188 * and marked Invalid (in the lower order bits of the pointer to the 189 189 * next item, ie the sentinel in this case): 190 * 190 * 191 191 * [old, N] --> [00b] -> [01b] -> [10b] -> [11b] -> sentinel 192 192 * ^ ^ 193 193 * [new0, Inv] -------------------------------------+ | 194 194 * [new1, Inv] ---------------------------------------+ 195 * 196 * 197 * b) Second, the resizer starts moving old bucket heads with the following 198 * lock-free protocol (from [5]) where cas(variable, expected_val, new_val) 195 * 196 * 197 * b) Second, the resizer starts moving old bucket heads with the following 198 * lock-free protocol (from [5]) where cas(variable, expected_val, new_val) 199 199 * is short for compare-and-swap: 200 * 200 * 201 201 * old head new0 head transition to next state 202 202 * -------- --------- ------------------------ 203 203 * addr, N sentinel, Inv cas(old, (addr, N), (addr, Const)) 204 * .. mark the old head as immutable, so that 205 * updaters do not relink it to other nodes 204 * .. mark the old head as immutable, so that 205 * updaters do not relink it to other nodes 206 206 * until the head move is done. 207 207 * addr, Const sentinel, Inv cas(new0, (sentinel, Inv), (addr, N)) 208 * .. move the address to the new head and mark 208 * .. move the address to the new head and mark 209 209 * the new head normal so updaters can start 210 210 * using it. … … 213 213 * the head move is done. 214 214 * addr, Inv addr, N 215 * 215 * 216 216 * Notice that concurrent updaters may step in at any point and correctly 217 217 * complete the head move without disrupting the resizer. At worst, the 218 * resizer or other concurrent updaters will attempt a number of CAS() that 218 * resizer or other concurrent updaters will attempt a number of CAS() that 219 219 * will correctly fail. 220 * 220 * 221 221 * [old, Inv] -> [00b] -> [01b] -> [10b] -> [11b] -> sentinel 222 222 * ^ ^ 223 223 * [new0, N] ----+ | 224 224 * [new1, Inv] --------------------------------------+ 225 * 226 * 227 * c) Third, buckets are split if the table is growing; or joined if 228 * shrinking (by the resizer or updaters depending on whoever accesses 225 * 226 * 227 * c) Third, buckets are split if the table is growing; or joined if 228 * shrinking (by the resizer or updaters depending on whoever accesses 229 229 * the bucket first). See split_bucket() and join_buckets() for details. 230 * 230 * 231 231 * 1) Mark the last item of new0 with JOIN_FOLLOWS: 232 232 * [old, Inv] -> [00b] -> [01b, JF] -> [10b] -> [11b] -> sentinel … … 234 234 * [new0, N] ----+ | 235 235 * [new1, Inv] ------------------------------------------+ 236 * 236 * 237 237 * 2) Mark the first item of new1 with JOIN_NODE: 238 238 * [old, Inv] -> [00b] -> [01b, JF] -> [10b, JN] -> [11b] -> sentinel … … 240 240 * [new0, N] ----+ | 241 241 * [new1, Inv] ----------------------------------------------+ 242 * 242 * 243 243 * 3) Point new1 to the join-node and mark new1 NORMAL. 244 244 * [old, Inv] -> [00b] -> [01b, JF] -> [10b, JN] -> [11b] -> sentinel … … 246 246 * [new0, N] ----+ | 247 247 * [new1, N] --------------------------+ 248 * 249 * 250 * d) Fourth, the resizer cleans up extra marks added during bucket 248 * 249 * 250 * d) Fourth, the resizer cleans up extra marks added during bucket 251 251 * splits/joins but only when it is sure all updaters are accessing 252 252 * the table via the new bucket heads only (ie it is certain there 253 * are no delayed updaters unaware of the resize and accessing the 253 * are no delayed updaters unaware of the resize and accessing the 254 254 * table via the old bucket head). 255 * 255 * 256 256 * [old, Inv] ---+ 257 257 * v … … 259 259 * v 260 260 * [new1, N] --> [10b, N] -> [11b] -> sentinel 261 * 262 * 261 * 262 * 263 263 * e) Last, the resizer publishes the new bucket head array for everyone 264 264 * to see and use. This signals the end of the resize and the old bucket 265 * array is freed. 266 * 267 * 265 * array is freed. 266 * 267 * 268 268 * To understand details of how the table is resized, read [1, 3, 5] 269 269 * and comments in join_buckets(), split_bucket(). 270 * 271 * 272 * [1] High performance dynamic lock-free hash tables and list-based sets, 270 * 271 * 272 * [1] High performance dynamic lock-free hash tables and list-based sets, 273 273 * Michael, 2002 274 274 * http://www.research.ibm.com/people/m/michael/spaa-2002.pdf … … 311 311 #define CHT_MIN_BUCKET_CNT (1 << CHT_MIN_ORDER) 312 312 /* Does not have to be a power of 2. */ 313 #define CHT_MAX_LOAD 2 313 #define CHT_MAX_LOAD 2 314 314 315 315 typedef cht_ptr_t marked_ptr_t; 316 316 typedef bool (*equal_pred_t)(void *arg, const cht_link_t *item); 317 317 318 /** The following mark items and bucket heads. 319 * 318 /** The following mark items and bucket heads. 319 * 320 320 * They are stored in the two low order bits of the next item pointers. 321 321 * Some marks may be combined. Some marks share the same binary value and … … 327 327 N_NORMAL = 0, 328 328 /** Logically deleted item that might have already been unlinked. 329 * 330 * May be combined with N_JOIN and N_JOIN_FOLLOWS. Applicable only 331 * to items; never to bucket heads. 332 * 333 * Once marked deleted an item remains marked deleted. 329 * 330 * May be combined with N_JOIN and N_JOIN_FOLLOWS. Applicable only 331 * to items; never to bucket heads. 332 * 333 * Once marked deleted an item remains marked deleted. 334 334 */ 335 335 N_DELETED = 1, 336 /** Immutable bucket head. 337 * 338 * The bucket is being moved or joined with another and its (old) head 336 /** Immutable bucket head. 337 * 338 * The bucket is being moved or joined with another and its (old) head 339 339 * must not be modified. 340 * 340 * 341 341 * May be combined with N_INVALID. Applicable only to old bucket heads, 342 342 * ie cht_t.b and not cht_t.new_b. 343 343 */ 344 344 N_CONST = 1, 345 /** Invalid bucket head. The bucket head must not be modified. 346 * 345 /** Invalid bucket head. The bucket head must not be modified. 346 * 347 347 * Old bucket heads (ie cht_t.b) are marked invalid if they have 348 348 * already been moved to cht_t.new_b or if the bucket had already … … 354 354 N_INVALID = 3, 355 355 /** The item is a join node, ie joining two buckets 356 * 356 * 357 357 * A join node is either the first node of the second part of 358 358 * a bucket to be split; or it is the first node of the bucket 359 359 * to be merged into/appended to/joined with another bucket. 360 * 361 * May be combined with N_DELETED. Applicable only to items, never 360 * 361 * May be combined with N_DELETED. Applicable only to items, never 362 362 * to bucket heads. 363 * 363 * 364 364 * Join nodes are referred to from two different buckets and may, 365 365 * therefore, not be safely/atomically unlinked from both buckets. … … 369 369 */ 370 370 N_JOIN = 2, 371 /** The next node is a join node and will soon be marked so. 372 * 371 /** The next node is a join node and will soon be marked so. 372 * 373 373 * A join-follows node is the last node of the first part of bucket 374 374 * that is to be split, ie it is the last node that will remain 375 375 * in the same bucket after splitting it. 376 * 376 * 377 377 * May be combined with N_DELETED. Applicable to items as well 378 * as to bucket heads of the bucket to be split (but only in cht_t.new_b). 378 * as to bucket heads of the bucket to be split (but only in cht_t.new_b). 379 379 */ 380 380 N_JOIN_FOLLOWS = 2, … … 413 413 414 414 static size_t size_to_order(size_t bucket_cnt, size_t min_order); 415 static cht_buckets_t *alloc_buckets(size_t order, bool set_invalid, 415 static cht_buckets_t *alloc_buckets(size_t order, bool set_invalid, 416 416 bool can_block); 417 417 static inline cht_link_t *find_lazy(cht_t *h, void *key); 418 static cht_link_t *search_bucket(cht_t *h, marked_ptr_t head, void *key, 418 static cht_link_t *search_bucket(cht_t *h, marked_ptr_t head, void *key, 419 419 size_t search_hash); 420 static cht_link_t *find_resizing(cht_t *h, void *key, size_t hash, 420 static cht_link_t *find_resizing(cht_t *h, void *key, size_t hash, 421 421 marked_ptr_t old_head, size_t old_idx); 422 422 static bool insert_impl(cht_t *h, cht_link_t *item, cht_link_t **dup_item); 423 423 static bool insert_at(cht_link_t *item, const wnd_t *wnd, walk_mode_t walk_mode, 424 424 bool *resizing); 425 static bool has_duplicate(cht_t *h, const cht_link_t *item, size_t hash, 425 static bool has_duplicate(cht_t *h, const cht_link_t *item, size_t hash, 426 426 cht_link_t *cur, cht_link_t **dup_item); 427 static cht_link_t *find_duplicate(cht_t *h, const cht_link_t *item, size_t hash, 427 static cht_link_t *find_duplicate(cht_t *h, const cht_link_t *item, size_t hash, 428 428 cht_link_t *start); 429 429 static bool remove_pred(cht_t *h, size_t hash, equal_pred_t pred, void *pred_arg); 430 static bool delete_at(cht_t *h, wnd_t *wnd, walk_mode_t walk_mode, 430 static bool delete_at(cht_t *h, wnd_t *wnd, walk_mode_t walk_mode, 431 431 bool *deleted_but_gc, bool *resizing); 432 432 static bool mark_deleted(cht_link_t *cur, walk_mode_t walk_mode, bool *resizing); 433 433 static bool unlink_from_pred(wnd_t *wnd, walk_mode_t walk_mode, bool *resizing); 434 static bool find_wnd_and_gc_pred(cht_t *h, size_t hash, walk_mode_t walk_mode, 434 static bool find_wnd_and_gc_pred(cht_t *h, size_t hash, walk_mode_t walk_mode, 435 435 equal_pred_t pred, void *pred_arg, wnd_t *wnd, bool *resizing); 436 static bool find_wnd_and_gc(cht_t *h, size_t hash, walk_mode_t walk_mode, 436 static bool find_wnd_and_gc(cht_t *h, size_t hash, walk_mode_t walk_mode, 437 437 wnd_t *wnd, bool *resizing); 438 438 static bool gc_deleted_node(cht_t *h, walk_mode_t walk_mode, wnd_t *wnd, 439 439 bool *resizing); 440 440 static bool join_completed(cht_t *h, const wnd_t *wnd); 441 static void upd_resizing_head(cht_t *h, size_t hash, marked_ptr_t **phead, 441 static void upd_resizing_head(cht_t *h, size_t hash, marked_ptr_t **phead, 442 442 bool *join_finishing, walk_mode_t *walk_mode); 443 443 static void item_removed(cht_t *h); … … 448 448 static void mark_const(marked_ptr_t *psrc_head); 449 449 static void complete_head_move(marked_ptr_t *psrc_head, marked_ptr_t *pdest_head); 450 static void split_bucket(cht_t *h, marked_ptr_t *psrc_head, 450 static void split_bucket(cht_t *h, marked_ptr_t *psrc_head, 451 451 marked_ptr_t *pdest_head, size_t split_hash); 452 static void mark_join_follows(cht_t *h, marked_ptr_t *psrc_head, 452 static void mark_join_follows(cht_t *h, marked_ptr_t *psrc_head, 453 453 size_t split_hash, wnd_t *wnd); 454 454 static void mark_join_node(cht_link_t *join_node); 455 static void join_buckets(cht_t *h, marked_ptr_t *psrc_head, 455 static void join_buckets(cht_t *h, marked_ptr_t *psrc_head, 456 456 marked_ptr_t *pdest_head, size_t split_hash); 457 static void link_to_join_node(cht_t *h, marked_ptr_t *pdest_head, 457 static void link_to_join_node(cht_t *h, marked_ptr_t *pdest_head, 458 458 cht_link_t *join_node, size_t split_hash); 459 459 static void resize_table(work_t *arg); … … 461 461 static void shrink_table(cht_t *h); 462 462 static void cleanup_join_node(cht_t *h, marked_ptr_t *new_head); 463 static void clear_join_and_gc(cht_t *h, cht_link_t *join_node, 463 static void clear_join_and_gc(cht_t *h, cht_link_t *join_node, 464 464 marked_ptr_t *new_head); 465 465 static void cleanup_join_follows(cht_t *h, marked_ptr_t *new_head); … … 478 478 static size_t grow_idx(size_t idx); 479 479 static size_t shrink_idx(size_t idx); 480 static marked_ptr_t cas_link(marked_ptr_t *link, const cht_link_t *cur_next, 480 static marked_ptr_t cas_link(marked_ptr_t *link, const cht_link_t *cur_next, 481 481 mark_t cur_mark, const cht_link_t *new_next, mark_t new_mark); 482 static marked_ptr_t _cas_link(marked_ptr_t *link, marked_ptr_t cur, 482 static marked_ptr_t _cas_link(marked_ptr_t *link, marked_ptr_t cur, 483 483 marked_ptr_t new); 484 484 static void cas_order_barrier(void); … … 490 490 491 491 /** Creates a concurrent hash table. 492 * 492 * 493 493 * @param h Valid pointer to a cht_t instance. 494 494 * @param op Item specific operations. All operations are compulsory. … … 497 497 bool cht_create_simple(cht_t *h, cht_ops_t *op) 498 498 { 499 return cht_create(h, 0, 0, 0, false, op); 499 return cht_create(h, 0, 0, 0, false, op); 500 500 } 501 501 502 502 /** Creates a concurrent hash table. 503 * 503 * 504 504 * @param h Valid pointer to a cht_t instance. 505 505 * @param init_size The initial number of buckets the table should contain. … … 513 513 * before the table grows. 514 514 * @param can_block If true creating the table blocks until enough memory 515 * is available (possibly indefinitely). Otherwise, 515 * is available (possibly indefinitely). Otherwise, 516 516 * table creation does not block and returns immediately 517 * even if not enough memory is available. 517 * even if not enough memory is available. 518 518 * @param op Item specific operations. All operations are compulsory. 519 519 * @return True if successfully created the table. False otherwise. 520 520 */ 521 bool cht_create(cht_t *h, size_t init_size, size_t min_size, size_t max_load, 521 bool cht_create(cht_t *h, size_t init_size, size_t min_size, size_t max_load, 522 522 bool can_block, cht_ops_t *op) 523 523 { … … 551 551 } 552 552 553 /* 553 /* 554 554 * Cached item hashes are stored in item->rcu_link.func. Once the item 555 555 * is deleted rcu_link.func will contain the value of invalid_hash. … … 564 564 565 565 /** Allocates and initializes 2^order buckets. 566 * 566 * 567 567 * All bucket heads are initialized to point to the sentinel node. 568 * 568 * 569 569 * @param order The number of buckets to allocate is 2^order. 570 570 * @param set_invalid Bucket heads are marked invalid if true; otherwise 571 571 * they are marked N_NORMAL. 572 572 * @param can_block If true memory allocation blocks until enough memory 573 * is available (possibly indefinitely). Otherwise, 574 * memory allocation does not block. 573 * is available (possibly indefinitely). Otherwise, 574 * memory allocation does not block. 575 575 * @return Newly allocated and initialized buckets or NULL if not enough memory. 576 576 */ … … 578 578 { 579 579 size_t bucket_cnt = (1 << order); 580 size_t bytes = 580 size_t bytes = 581 581 sizeof(cht_buckets_t) + (bucket_cnt - 1) * sizeof(marked_ptr_t); 582 582 cht_buckets_t *b = malloc(bytes, can_block ? 0 : FRAME_ATOMIC); … … 587 587 b->order = order; 588 588 589 marked_ptr_t head_link = set_invalid 590 ? make_link(&sentinel, N_INVALID) 589 marked_ptr_t head_link = set_invalid 590 ? make_link(&sentinel, N_INVALID) 591 591 : make_link(&sentinel, N_NORMAL); 592 592 … … 615 615 616 616 /** Destroys a CHT successfully created via cht_create(). 617 * 617 * 618 618 * Waits for all outstanding concurrent operations to complete and 619 619 * frees internal allocated resources. The table is however not cleared … … 644 644 645 645 /** Returns the first item equal to the search key or NULL if not found. 646 * 647 * The call must be enclosed in a rcu_read_lock() unlock() pair. The 646 * 647 * The call must be enclosed in a rcu_read_lock() unlock() pair. The 648 648 * returned item is guaranteed to be allocated until rcu_read_unlock() 649 649 * although the item may be concurrently removed from the table by another 650 650 * cpu. 651 * 651 * 652 652 * Further items matching the key may be retrieved via cht_find_next(). 653 * 653 * 654 654 * cht_find() sees the effects of any completed cht_remove(), cht_insert(). 655 655 * If a concurrent remove or insert had not yet completed cht_find() may 656 656 * or may not see the effects of it (eg it may find an item being removed). 657 * 657 * 658 658 * @param h CHT to operate on. 659 659 * @param key Search key as defined by cht_ops_t.key_equal() and .key_hash(). … … 668 668 669 669 /** Returns the first item equal to the search key or NULL if not found. 670 * 671 * Unlike cht_find(), cht_find_lazy() may not see the effects of 670 * 671 * Unlike cht_find(), cht_find_lazy() may not see the effects of 672 672 * cht_remove() or cht_insert() even though they have already completed. 673 673 * It may take a couple of milliseconds for those changes to propagate 674 * and become visible to cht_find_lazy(). On the other hand, cht_find_lazy() 674 * and become visible to cht_find_lazy(). On the other hand, cht_find_lazy() 675 675 * operates a bit faster than cht_find(). 676 * 676 * 677 677 * See cht_find() for more details. 678 678 */ … … 693 693 cht_buckets_t *b = rcu_access(h->b); 694 694 size_t idx = calc_bucket_idx(hash, b->order); 695 /* 696 * No need for access_once. b->head[idx] will point to an allocated node 695 /* 696 * No need for access_once. b->head[idx] will point to an allocated node 697 697 * even if marked invalid until we exit rcu read section. 698 698 */ … … 706 706 } 707 707 708 /** Returns the next item matching \a item. 709 * 710 * Must be enclosed in a rcu_read_lock()/unlock() pair. Effects of 708 /** Returns the next item matching \a item. 709 * 710 * Must be enclosed in a rcu_read_lock()/unlock() pair. Effects of 711 711 * completed cht_remove(), cht_insert() are guaranteed to be visible 712 712 * to cht_find_next(). 713 * 714 * See cht_find() for more details. 713 * 714 * See cht_find() for more details. 715 715 */ 716 716 cht_link_t *cht_find_next(cht_t *h, const cht_link_t *item) … … 721 721 } 722 722 723 /** Returns the next item matching \a item. 724 * 725 * Must be enclosed in a rcu_read_lock()/unlock() pair. Effects of 723 /** Returns the next item matching \a item. 724 * 725 * Must be enclosed in a rcu_read_lock()/unlock() pair. Effects of 726 726 * completed cht_remove(), cht_insert() may or may not be visible 727 727 * to cht_find_next_lazy(). 728 * 729 * See cht_find_lazy() for more details. 728 * 729 * See cht_find_lazy() for more details. 730 730 */ 731 731 cht_link_t *cht_find_next_lazy(cht_t *h, const cht_link_t *item) … … 739 739 740 740 /** Searches the bucket at head for key using search_hash. */ 741 static inline cht_link_t *search_bucket(cht_t *h, marked_ptr_t head, void *key, 741 static inline cht_link_t *search_bucket(cht_t *h, marked_ptr_t head, void *key, 742 742 size_t search_hash) 743 743 { 744 /* 744 /* 745 745 * It is safe to access nodes even outside of this bucket (eg when 746 * splitting the bucket). The resizer makes sure that any node we 746 * splitting the bucket). The resizer makes sure that any node we 747 747 * may find by following the next pointers is allocated. 748 748 */ … … 759 759 } while (node_hash(h, cur) < search_hash); 760 760 761 /* 761 /* 762 762 * Only search for an item with an equal key if cur is not the sentinel 763 * node or a node with a different hash. 763 * node or a node with a different hash. 764 764 */ 765 765 while (node_hash(h, cur) == search_hash) { … … 771 771 cur = get_next(cur->link); 772 772 assert(cur); 773 } 774 775 /* 773 } 774 775 /* 776 776 * In the unlikely case that we have encountered a node whose cached 777 777 * hash has been overwritten due to a pending rcu_call for it, skip … … 787 787 788 788 /** Searches for the key while the table is undergoing a resize. */ 789 static cht_link_t *find_resizing(cht_t *h, void *key, size_t hash, 789 static cht_link_t *find_resizing(cht_t *h, void *key, size_t hash, 790 790 marked_ptr_t old_head, size_t old_idx) 791 791 { 792 assert(N_INVALID == get_mark(old_head)); 792 assert(N_INVALID == get_mark(old_head)); 793 793 assert(h->new_b); 794 794 … … 799 799 /* Growing. */ 800 800 if (h->b->order < h->new_b->order) { 801 /* 801 /* 802 802 * Old bucket head is invalid, so it must have been already 803 803 * moved. Make the new head visible if still not visible, ie … … 805 805 */ 806 806 if (N_INVALID == get_mark(new_head)) { 807 /* 807 /* 808 808 * We should be searching a newly added bucket but the old 809 * moved bucket has not yet been split (its marked invalid) 810 * or we have not yet seen the split. 809 * moved bucket has not yet been split (its marked invalid) 810 * or we have not yet seen the split. 811 811 */ 812 812 if (grow_idx(old_idx) != new_idx) { 813 /* 813 /* 814 814 * Search the moved bucket. It is guaranteed to contain 815 815 * items of the newly added bucket that were present … … 821 821 /* new_head is now the moved bucket, either valid or invalid. */ 822 822 823 /* 823 /* 824 824 * The old bucket was definitely moved to new_head but the 825 825 * change of new_head had not yet propagated to this cpu. … … 829 829 * We could issue a read_barrier() and make the now valid 830 830 * moved bucket head new_head visible, but instead fall back 831 * on using the old bucket. Although the old bucket head is 832 * invalid, it points to a node that is allocated and in the 831 * on using the old bucket. Although the old bucket head is 832 * invalid, it points to a node that is allocated and in the 833 833 * right bucket. Before the node can be freed, it must be 834 834 * unlinked from the head (or another item after that item 835 * modified the new_head) and a grace period must elapse. 835 * modified the new_head) and a grace period must elapse. 836 836 * As a result had the node been already freed the grace 837 837 * period preceeding the free() would make the unlink and … … 855 855 856 856 /* 857 * h->b->head[move_src_idx] had already been moved to new_head 857 * h->b->head[move_src_idx] had already been moved to new_head 858 858 * but the change to new_head had not yet propagated to us. 859 859 */ 860 860 if (N_INVALID == get_mark(new_head)) { 861 861 /* 862 * new_head is definitely valid and we could make it visible 863 * to this cpu with a read_barrier(). Instead, use the bucket 864 * in the old table that was moved even though it is now marked 862 * new_head is definitely valid and we could make it visible 863 * to this cpu with a read_barrier(). Instead, use the bucket 864 * in the old table that was moved even though it is now marked 865 865 * as invalid. The node it points to must be allocated because 866 866 * a grace period would have to elapse before it could be freed; 867 * and the grace period would make the now valid new_head 868 * visible to all cpus. 869 * 867 * and the grace period would make the now valid new_head 868 * visible to all cpus. 869 * 870 870 * Note that move_src_idx may not be the same as old_idx. 871 871 * If move_src_idx != old_idx then old_idx is the bucket … … 873 873 * appended to the moved bucket, ie it is added at the tail 874 874 * of new_head. In that case an invalid old_head notes that 875 * it had already been merged into (the moved) new_head. 875 * it had already been merged into (the moved) new_head. 876 876 * We will try to search that bucket first because it 877 * may contain some newly added nodes after the bucket 878 * join. Moreover, the bucket joining link may already be 877 * may contain some newly added nodes after the bucket 878 * join. Moreover, the bucket joining link may already be 879 879 * visible even if new_head is not. Therefore, if we're 880 880 * lucky we'll find the item via moved_old_head. In any … … 895 895 if (move_src_idx != old_idx && get_next(old_head) != &sentinel) { 896 896 /* 897 * Note that old_head (the bucket to be merged into new_head) 898 * points to an allocated join node (if non-null) even if marked 897 * Note that old_head (the bucket to be merged into new_head) 898 * points to an allocated join node (if non-null) even if marked 899 899 * invalid. Before the resizer lets join nodes to be unlinked 900 900 * (and freed) it sets old_head to NULL and waits for a grace period. … … 909 909 return NULL; 910 910 } else { 911 /* 911 /* 912 912 * Resize is almost done. The resizer is waiting to make 913 913 * sure all cpus see that the new table replaced the old one. 914 914 */ 915 915 assert(h->b->order == h->new_b->order); 916 /* 916 /* 917 917 * The resizer must ensure all new bucket heads are visible before 918 918 * replacing the old table. … … 929 929 } 930 930 931 /** Inserts a unique item. Returns false if an equal item was already present. 932 * 931 /** Inserts a unique item. Returns false if an equal item was already present. 932 * 933 933 * Use this function to atomically check if an equal/duplicate item had 934 * not yet been inserted into the table and to insert this item into the 934 * not yet been inserted into the table and to insert this item into the 935 935 * table. 936 * 936 * 937 937 * The following is @e NOT thread-safe, so do not use: 938 938 * @code … … 944 944 * } 945 945 * @endcode 946 * 946 * 947 947 * Replace such code with: 948 948 * @code … … 951 951 * // Whoops, someone beat us to it - an equal item 'dup_item' 952 952 * // had already been inserted. 953 * free(item); 953 * free(item); 954 954 * } else { 955 955 * // Successfully inserted the item and we are guaranteed that … … 957 957 * } 958 958 * @endcode 959 * 959 * 960 960 */ 961 961 bool cht_insert_unique(cht_t *h, cht_link_t *item, cht_link_t **dup_item) … … 1007 1007 } 1008 1008 1009 inserted = insert_at(item, &wnd, walk_mode, &resizing); 1009 inserted = insert_at(item, &wnd, walk_mode, &resizing); 1010 1010 } while (!inserted); 1011 1011 … … 1016 1016 } 1017 1017 1018 /** Inserts item between wnd.ppred and wnd.cur. 1019 * 1018 /** Inserts item between wnd.ppred and wnd.cur. 1019 * 1020 1020 * @param item Item to link to wnd.ppred and wnd.cur. 1021 1021 * @param wnd The item will be inserted before wnd.cur. Wnd.ppred 1022 1022 * must be N_NORMAL. 1023 * @param walk_mode 1024 * @param resizing Set to true only if the table is undergoing resize 1023 * @param walk_mode 1024 * @param resizing Set to true only if the table is undergoing resize 1025 1025 * and it was not expected (ie walk_mode == WM_NORMAL). 1026 1026 * @return True if the item was successfully linked to wnd.ppred. False … … 1028 1028 * of wnd.cur has changed. 1029 1029 */ 1030 inline static bool insert_at(cht_link_t *item, const wnd_t *wnd, 1030 inline static bool insert_at(cht_link_t *item, const wnd_t *wnd, 1031 1031 walk_mode_t walk_mode, bool *resizing) 1032 1032 { … … 1076 1076 1077 1077 /** Returns true if the chain starting at cur has an item equal to \a item. 1078 * 1078 * 1079 1079 * @param h CHT to operate on. 1080 1080 * @param item Item whose duplicates the function looks for. … … 1084 1084 * @return True if a non-deleted item equal to \a item exists in the table. 1085 1085 */ 1086 static inline bool has_duplicate(cht_t *h, const cht_link_t *item, size_t hash, 1086 static inline bool has_duplicate(cht_t *h, const cht_link_t *item, size_t hash, 1087 1087 cht_link_t *cur, cht_link_t **dup_item) 1088 1088 { … … 1095 1095 return false; 1096 1096 1097 /* 1098 * Load the most recent node marks. Otherwise we might pronounce a 1099 * logically deleted node for a duplicate of the item just because 1097 /* 1098 * Load the most recent node marks. Otherwise we might pronounce a 1099 * logically deleted node for a duplicate of the item just because 1100 1100 * the deleted node's DEL mark had not yet propagated to this cpu. 1101 1101 */ … … 1107 1107 1108 1108 /** Returns an item that is equal to \a item starting in a chain at \a start. */ 1109 static cht_link_t *find_duplicate(cht_t *h, const cht_link_t *item, size_t hash, 1109 static cht_link_t *find_duplicate(cht_t *h, const cht_link_t *item, size_t hash, 1110 1110 cht_link_t *start) 1111 1111 { … … 1114 1114 cht_link_t *cur = start; 1115 1115 1116 try_again: 1116 try_again: 1117 1117 assert(cur); 1118 1118 … … 1128 1128 cur = get_next(cur->link); 1129 1129 assert(cur); 1130 } 1130 } 1131 1131 1132 1132 /* Skip logically deleted nodes with rcu_call() in progress. */ … … 1147 1147 size_t removed = 0; 1148 1148 1149 while (remove_pred(h, hash, h->op->key_equal, key)) 1149 while (remove_pred(h, hash, h->op->key_equal, key)) 1150 1150 ++removed; 1151 1151 … … 1153 1153 } 1154 1154 1155 /** Removes a specific item from the table. 1156 * 1157 * The called must hold rcu read lock. 1158 * 1155 /** Removes a specific item from the table. 1156 * 1157 * The called must hold rcu read lock. 1158 * 1159 1159 * @param item Item presumably present in the table and to be removed. 1160 1160 * @return True if the item was removed successfully; or false if it had 1161 * already been deleted. 1161 * already been deleted. 1162 1162 */ 1163 1163 bool cht_remove_item(cht_t *h, cht_link_t *item) … … 1168 1168 assert(rcu_read_locked()); 1169 1169 1170 /* 1170 /* 1171 1171 * Even though we know the node we want to delete we must unlink it 1172 * from the correct bucket and from a clean/normal predecessor. Therefore, 1172 * from the correct bucket and from a clean/normal predecessor. Therefore, 1173 1173 * we search for it again from the beginning of the correct bucket. 1174 1174 */ … … 1213 1213 } 1214 1214 1215 /* 1215 /* 1216 1216 * The item lookup is affected by a bucket join but effects of 1217 1217 * the bucket join have not been seen while searching for the item. 1218 1218 */ 1219 1219 if (join_finishing && !join_completed(h, &wnd)) { 1220 /* 1221 * Bucket was appended at the end of another but the next 1222 * ptr linking them together was not visible on this cpu. 1220 /* 1221 * Bucket was appended at the end of another but the next 1222 * ptr linking them together was not visible on this cpu. 1223 1223 * join_completed() makes this appended bucket visible. 1224 1224 */ … … 1237 1237 } 1238 1238 1239 deleted = delete_at(h, &wnd, walk_mode, &deleted_but_gc, &resizing); 1239 deleted = delete_at(h, &wnd, walk_mode, &deleted_but_gc, &resizing); 1240 1240 } while (!deleted || deleted_but_gc); 1241 1241 … … 1245 1245 1246 1246 /** Unlinks wnd.cur from wnd.ppred and schedules a deferred free for the item. 1247 * 1247 * 1248 1248 * Ignores nodes marked N_JOIN if walk mode is WM_LEAVE_JOIN. 1249 * 1249 * 1250 1250 * @param h CHT to operate on. 1251 1251 * @param wnd Points to the item to delete and its N_NORMAL predecessor. 1252 1252 * @param walk_mode Bucket chaing walk mode. 1253 * @param deleted_but_gc Set to true if the item had been logically deleted, 1253 * @param deleted_but_gc Set to true if the item had been logically deleted, 1254 1254 * but a garbage collecting walk of the bucket is in order for 1255 * it to be fully unlinked. 1255 * it to be fully unlinked. 1256 1256 * @param resizing Set to true if the table is undergoing an unexpected 1257 1257 * resize (ie walk_mode == WM_NORMAL). … … 1259 1259 * delete operation must be retried. 1260 1260 */ 1261 static inline bool delete_at(cht_t *h, wnd_t *wnd, walk_mode_t walk_mode, 1261 static inline bool delete_at(cht_t *h, wnd_t *wnd, walk_mode_t walk_mode, 1262 1262 bool *deleted_but_gc, bool *resizing) 1263 1263 { … … 1289 1289 1290 1290 /** Marks cur logically deleted. Returns false to request a retry. */ 1291 static inline bool mark_deleted(cht_link_t *cur, walk_mode_t walk_mode, 1291 static inline bool mark_deleted(cht_link_t *cur, walk_mode_t walk_mode, 1292 1292 bool *resizing) 1293 1293 { 1294 1294 assert(cur && cur != &sentinel); 1295 1295 1296 /* 1296 /* 1297 1297 * Btw, we could loop here if the cas fails but let's not complicate 1298 * things and let's retry from the head of the bucket. 1298 * things and let's retry from the head of the bucket. 1299 1299 */ 1300 1300 … … 1315 1315 mark_t cur_mark = get_mark(cur->link) & N_JOIN_FOLLOWS; 1316 1316 1317 marked_ptr_t ret = 1317 marked_ptr_t ret = 1318 1318 cas_link(&cur->link, next, cur_mark, next, cur_mark | N_DELETED); 1319 1319 1320 1320 if (ret != make_link(next, cur_mark)) 1321 1321 return false; 1322 } 1322 } 1323 1323 1324 1324 return true; … … 1326 1326 1327 1327 /** Unlinks wnd.cur from wnd.ppred. Returns false if it should be retried. */ 1328 static inline bool unlink_from_pred(wnd_t *wnd, walk_mode_t walk_mode, 1328 static inline bool unlink_from_pred(wnd_t *wnd, walk_mode_t walk_mode, 1329 1329 bool *resizing) 1330 1330 { … … 1355 1355 marked_ptr_t pred_link = make_link(wnd->cur, N_NORMAL); 1356 1356 /* Link to cur's successor keeping/copying cur's JF mark. */ 1357 marked_ptr_t next_link = make_link(next, cur_mark); 1357 marked_ptr_t next_link = make_link(next, cur_mark); 1358 1358 1359 1359 marked_ptr_t ret = _cas_link(wnd->ppred, pred_link, next_link); … … 1361 1361 if (pred_link != ret) { 1362 1362 /* If we're not resizing the table there are no JF/JN nodes. */ 1363 *resizing = (walk_mode == WM_NORMAL) 1363 *resizing = (walk_mode == WM_NORMAL) 1364 1364 && (N_JOIN_FOLLOWS & get_mark(ret)); 1365 1365 return false; … … 1371 1371 1372 1372 /** Finds the first non-deleted item equal to \a pred_arg according to \a pred. 1373 * 1373 * 1374 1374 * The function returns the candidate item in \a wnd. Logically deleted 1375 1375 * nodes are garbage collected so the predecessor will most likely not 1376 * be marked as deleted. 1377 * 1376 * be marked as deleted. 1377 * 1378 1378 * Unlike find_wnd_and_gc(), this function never returns a node that 1379 1379 * is known to have already been marked N_DELETED. … … 1382 1382 * collected, ie free in the background via rcu_call (except for join-nodes 1383 1383 * if walk_mode == WM_LEAVE_JOIN). 1384 * 1384 * 1385 1385 * @param h CHT to operate on. 1386 1386 * @param hash Hash the search for. … … 1392 1392 * @param resizing Set to true if the table is resizing but it was not 1393 1393 * expected (ie walk_mode == WM_NORMAL). 1394 * @return False if the operation has to be retried. True otherwise 1394 * @return False if the operation has to be retried. True otherwise 1395 1395 * (even if an equal item had not been found). 1396 1396 */ 1397 static bool find_wnd_and_gc_pred(cht_t *h, size_t hash, walk_mode_t walk_mode, 1397 static bool find_wnd_and_gc_pred(cht_t *h, size_t hash, walk_mode_t walk_mode, 1398 1398 equal_pred_t pred, void *pred_arg, wnd_t *wnd, bool *resizing) 1399 1399 { … … 1403 1403 return true; 1404 1404 1405 /* 1406 * A read barrier is not needed here to bring up the most recent 1405 /* 1406 * A read barrier is not needed here to bring up the most recent 1407 1407 * node marks (esp the N_DELETED). At worst we'll try to delete 1408 1408 * an already deleted node; fail in delete_at(); and retry. … … 1411 1411 size_t cur_hash; 1412 1412 1413 try_again: 1413 try_again: 1414 1414 cur_hash = node_hash(h, wnd->cur); 1415 1415 … … 1445 1445 1446 1446 /** Find the first item (deleted or not) with a hash greater or equal to \a hash. 1447 * 1448 * The function returns the first item with a hash that is greater or 1447 * 1448 * The function returns the first item with a hash that is greater or 1449 1449 * equal to \a hash in \a wnd. Moreover it garbage collects logically 1450 1450 * deleted node that have not yet been unlinked and freed. Therefore, 1451 1451 * the returned node's predecessor will most likely be N_NORMAL. 1452 * 1452 * 1453 1453 * Unlike find_wnd_and_gc_pred(), this function may return a node 1454 1454 * that is known to had been marked N_DELETED. 1455 * 1455 * 1456 1456 * @param h CHT to operate on. 1457 1457 * @param hash Hash of the item to find. 1458 1458 * @param walk_mode Bucket chain walk mode. 1459 * @param[in,out] wnd wnd.cur denotes the first node of the chain. If the 1460 * the operation is successful, \a wnd points to the desired 1459 * @param[in,out] wnd wnd.cur denotes the first node of the chain. If the 1460 * the operation is successful, \a wnd points to the desired 1461 1461 * item. 1462 1462 * @param resizing Set to true if a table resize was detected but walk_mode 1463 1463 * suggested the table was not undergoing a resize. 1464 * @return False indicates the operation must be retried. True otherwise 1464 * @return False indicates the operation must be retried. True otherwise 1465 1465 * (even if an item with exactly the same has was not found). 1466 1466 */ 1467 static bool find_wnd_and_gc(cht_t *h, size_t hash, walk_mode_t walk_mode, 1467 static bool find_wnd_and_gc(cht_t *h, size_t hash, walk_mode_t walk_mode, 1468 1468 wnd_t *wnd, bool *resizing) 1469 1469 { … … 1505 1505 } else { 1506 1506 /* Ordinary deleted node or a deleted JOIN_FOLLOWS. */ 1507 assert(walk_mode != WM_LEAVE_JOIN 1507 assert(walk_mode != WM_LEAVE_JOIN 1508 1508 || !((N_JOIN | N_JOIN_FOLLOWS) & get_mark(wnd->cur->link))); 1509 1509 … … 1525 1525 1526 1526 /** Returns true if a bucket join had already completed. 1527 * 1528 * May only be called if upd_resizing_head() indicates a bucket join 1527 * 1528 * May only be called if upd_resizing_head() indicates a bucket join 1529 1529 * may be in progress. 1530 * 1530 * 1531 1531 * If it returns false, the search must be retried in order to guarantee 1532 1532 * all item that should have been encountered have been seen. … … 1534 1534 static bool join_completed(cht_t *h, const wnd_t *wnd) 1535 1535 { 1536 /* 1537 * The table is shrinking and the searched for item is in a bucket 1538 * appended to another. Check that the link joining these two buckets 1536 /* 1537 * The table is shrinking and the searched for item is in a bucket 1538 * appended to another. Check that the link joining these two buckets 1539 1539 * is visible and if not, make it visible to this cpu. 1540 1540 */ 1541 1541 1542 /* 1543 * Resizer ensures h->b->order stays the same for the duration of this 1542 /* 1543 * Resizer ensures h->b->order stays the same for the duration of this 1544 1544 * func. We got here because there was an alternative head to search. 1545 1545 * The resizer waits for all preexisting readers to finish after 1546 * it 1546 * it 1547 1547 */ 1548 1548 assert(h->b->order > h->new_b->order); … … 1565 1565 size_t move_src_idx = grow_idx(shrink_idx(last_old_idx)); 1566 1566 1567 /* 1568 * Last node seen was in the joining bucket - if the searched 1569 * for node is there we will find it. 1567 /* 1568 * Last node seen was in the joining bucket - if the searched 1569 * for node is there we will find it. 1570 1570 */ 1571 if (move_src_idx != last_old_idx) 1571 if (move_src_idx != last_old_idx) 1572 1572 return true; 1573 1573 } 1574 1574 1575 /* 1575 /* 1576 1576 * Reached the end of the bucket but no nodes from the joining bucket 1577 1577 * were seen. There should have at least been a JOIN node so we have … … 1584 1584 1585 1585 /** When resizing returns the bucket head to start the search with in \a phead. 1586 * 1586 * 1587 1587 * If a resize had been detected (eg cht_t.b.head[idx] is marked immutable). 1588 1588 * upd_resizing_head() moves the bucket for \a hash from the old head 1589 1589 * to the new head. Moreover, it splits or joins buckets as necessary. 1590 * 1590 * 1591 1591 * @param h CHT to operate on. 1592 1592 * @param hash Hash of an item whose chain we would like to traverse. … … 1595 1595 * in progress and the bucket may have to traversed again 1596 1596 * as indicated by join_completed(). 1597 * @param[out] walk_mode Specifies how to interpret node marks. 1598 */ 1599 static void upd_resizing_head(cht_t *h, size_t hash, marked_ptr_t **phead, 1597 * @param[out] walk_mode Specifies how to interpret node marks. 1598 */ 1599 static void upd_resizing_head(cht_t *h, size_t hash, marked_ptr_t **phead, 1600 1600 bool *join_finishing, walk_mode_t *walk_mode) 1601 1601 { … … 1621 1621 if (move_dest_idx == new_idx) { 1622 1622 assert(pmoved_head == pnew_head); 1623 /* 1624 * move_head() makes the new head of the moved bucket visible. 1623 /* 1624 * move_head() makes the new head of the moved bucket visible. 1625 1625 * The new head may be marked with a JOIN_FOLLOWS 1626 1626 */ … … 1629 1629 } else { 1630 1630 assert(pmoved_head != pnew_head); 1631 /* 1632 * The hash belongs to the bucket that is the result of splitting 1631 /* 1632 * The hash belongs to the bucket that is the result of splitting 1633 1633 * the old/moved bucket, ie the bucket that contains the second 1634 1634 * half of the split/old/moved bucket. … … 1639 1639 size_t split_hash = calc_split_hash(new_idx, h->new_b->order); 1640 1640 split_bucket(h, pmoved_head, pnew_head, split_hash); 1641 /* 1642 * split_bucket() makes the new head visible. No 1641 /* 1642 * split_bucket() makes the new head visible. No 1643 1643 * JOIN_FOLLOWS in this part of split bucket. 1644 1644 */ … … 1653 1653 size_t move_src_idx = grow_idx(new_idx); 1654 1654 1655 /* 1656 * Complete moving the bucket from the old to the new table. 1655 /* 1656 * Complete moving the bucket from the old to the new table. 1657 1657 * Makes a valid pnew_head visible if already moved. 1658 1658 */ … … 1667 1667 } 1668 1668 1669 /* 1669 /* 1670 1670 * The resizer sets pold_head to &sentinel when all cpus are 1671 1671 * guaranteed to see the bucket join. … … 1681 1681 *walk_mode = WM_LEAVE_JOIN; 1682 1682 } else { 1683 /* 1684 * Final stage of resize. The resizer is waiting for all 1683 /* 1684 * Final stage of resize. The resizer is waiting for all 1685 1685 * readers to notice that the old table had been replaced. 1686 1686 */ … … 1700 1700 #endif 1701 1701 1702 /** Moves an immutable head \a psrc_head of cht_t.b to \a pdest_head of cht_t.new_b. 1703 * 1702 /** Moves an immutable head \a psrc_head of cht_t.b to \a pdest_head of cht_t.new_b. 1703 * 1704 1704 * The function guarantees the move will be visible on this cpu once 1705 1705 * it completes. In particular, *pdest_head will not be N_INVALID. 1706 * 1706 * 1707 1707 * Unlike complete_head_move(), help_head_move() checks if the head had already 1708 1708 * been moved and tries to avoid moving the bucket heads if possible. 1709 1709 */ 1710 static inline void help_head_move(marked_ptr_t *psrc_head, 1710 static inline void help_head_move(marked_ptr_t *psrc_head, 1711 1711 marked_ptr_t *pdest_head) 1712 1712 { … … 1729 1729 1730 1730 /** Initiates the move of the old head \a psrc_head. 1731 * 1732 * The move may be completed with help_head_move(). 1731 * 1732 * The move may be completed with help_head_move(). 1733 1733 */ 1734 1734 static void start_head_move(marked_ptr_t *psrc_head) … … 1766 1766 cas_order_barrier(); 1767 1767 1768 DBG(ret = ) 1769 cas_link(psrc_head, next, N_CONST, next, N_INVALID); 1768 DBG(ret = ) 1769 cas_link(psrc_head, next, N_CONST, next, N_INVALID); 1770 1770 assert(ret == make_link(next, N_CONST) || (N_INVALID == get_mark(ret))); 1771 1771 cas_order_barrier(); … … 1773 1773 1774 1774 /** Splits the bucket at psrc_head and links to the remainder from pdest_head. 1775 * 1775 * 1776 1776 * Items with hashes greater or equal to \a split_hash are moved to bucket 1777 * with head at \a pdest_head. 1778 * 1777 * with head at \a pdest_head. 1778 * 1779 1779 * @param h CHT to operate on. 1780 1780 * @param psrc_head Head of the bucket to split (in cht_t.new_b). 1781 1781 * @param pdest_head Head of the bucket that points to the second part 1782 1782 * of the split bucket in psrc_head. (in cht_t.new_b) 1783 * @param split_hash Hash of the first possible item in the remainder of 1783 * @param split_hash Hash of the first possible item in the remainder of 1784 1784 * psrc_head, ie the smallest hash pdest_head is allowed 1785 1785 * to point to.. 1786 1786 */ 1787 static void split_bucket(cht_t *h, marked_ptr_t *psrc_head, 1787 static void split_bucket(cht_t *h, marked_ptr_t *psrc_head, 1788 1788 marked_ptr_t *pdest_head, size_t split_hash) 1789 1789 { … … 1794 1794 /* 1795 1795 * L == Last node of the first part of the split bucket. That part 1796 * remains in the original/src bucket. 1796 * remains in the original/src bucket. 1797 1797 * F == First node of the second part of the split bucket. That part 1798 1798 * will be referenced from the dest bucket head. 1799 1799 * 1800 * We want to first mark a clean L as JF so that updaters unaware of 1800 * We want to first mark a clean L as JF so that updaters unaware of 1801 1801 * the split (or table resize): 1802 1802 * - do not insert a new node between L and F … … 1804 1804 * - do not unlink F 1805 1805 * 1806 * Then we can safely mark F as JN even if it has been marked deleted. 1807 * Once F is marked as JN updaters aware of table resize will not 1806 * Then we can safely mark F as JN even if it has been marked deleted. 1807 * Once F is marked as JN updaters aware of table resize will not 1808 1808 * attempt to unlink it (JN will have two predecessors - we cannot 1809 * safely unlink from both at the same time). Updaters unaware of 1810 * ongoing resize can reach F only via L and that node is already 1809 * safely unlink from both at the same time). Updaters unaware of 1810 * ongoing resize can reach F only via L and that node is already 1811 1811 * marked JF, so they won't unlink F. 1812 * 1812 * 1813 1813 * Last, link the new/dest head to F. 1814 * 1815 * 1816 * 0) ,-- split_hash, first hash of the dest bucket 1817 * v 1814 * 1815 * 1816 * 0) ,-- split_hash, first hash of the dest bucket 1817 * v 1818 1818 * [src_head | N] -> .. -> [L] -> [F] 1819 1819 * [dest_head | Inv] 1820 * 1820 * 1821 1821 * 1) ,-- split_hash 1822 * v 1822 * v 1823 1823 * [src_head | N] -> .. -> [JF] -> [F] 1824 1824 * [dest_head | Inv] 1825 * 1825 * 1826 1826 * 2) ,-- split_hash 1827 * v 1827 * v 1828 1828 * [src_head | N] -> .. -> [JF] -> [JN] 1829 1829 * [dest_head | Inv] 1830 * 1830 * 1831 1831 * 3) ,-- split_hash 1832 * v 1832 * v 1833 1833 * [src_head | N] -> .. -> [JF] -> [JN] 1834 1834 * ^ … … 1845 1845 /* There are nodes in the dest bucket, ie the second part of the split. */ 1846 1846 if (wnd.cur != &sentinel) { 1847 /* 1848 * Mark the first node of the dest bucket as a join node so 1849 * updaters do not attempt to unlink it if it is deleted. 1847 /* 1848 * Mark the first node of the dest bucket as a join node so 1849 * updaters do not attempt to unlink it if it is deleted. 1850 1850 */ 1851 1851 mark_join_node(wnd.cur); 1852 1852 cas_order_barrier(); 1853 1853 } else { 1854 /* 1854 /* 1855 1855 * Second part of the split bucket is empty. There are no nodes 1856 1856 * to mark as JOIN nodes and there never will be. … … 1868 1868 1869 1869 /** Finds and marks the last node of psrc_head w/ hash less than split_hash. 1870 * 1871 * Finds a node in psrc_head with the greatest hash that is strictly less 1872 * than split_hash and marks it with N_JOIN_FOLLOWS. 1873 * 1874 * Returns a window pointing to that node. 1875 * 1876 * Any logically deleted nodes along the way are 1877 * garbage collected; therefore, the predecessor node (if any) will most 1870 * 1871 * Finds a node in psrc_head with the greatest hash that is strictly less 1872 * than split_hash and marks it with N_JOIN_FOLLOWS. 1873 * 1874 * Returns a window pointing to that node. 1875 * 1876 * Any logically deleted nodes along the way are 1877 * garbage collected; therefore, the predecessor node (if any) will most 1878 1878 * likely not be marked N_DELETED. 1879 * 1879 * 1880 1880 * @param h CHT to operate on. 1881 1881 * @param psrc_head Bucket head. … … 1884 1884 * @param[out] wnd Points to the node marked with N_JOIN_FOLLOWS. 1885 1885 */ 1886 static void mark_join_follows(cht_t *h, marked_ptr_t *psrc_head, 1886 static void mark_join_follows(cht_t *h, marked_ptr_t *psrc_head, 1887 1887 size_t split_hash, wnd_t *wnd) 1888 1888 { … … 1896 1896 wnd->cur = get_next(*psrc_head); 1897 1897 1898 /* 1898 /* 1899 1899 * Find the split window, ie the last node of the first part of 1900 1900 * the split bucket and the its successor - the first node of 1901 * the second part of the split bucket. Retry if GC failed. 1901 * the second part of the split bucket. Retry if GC failed. 1902 1902 */ 1903 1903 if (!find_wnd_and_gc(h, split_hash, WM_MOVE_JOIN_FOLLOWS, wnd, &resizing)) … … 1906 1906 /* Must not report that the table is resizing if WM_MOVE_JOIN_FOLLOWS.*/ 1907 1907 assert(!resizing); 1908 /* 1909 * Mark the last node of the first half of the split bucket 1908 /* 1909 * Mark the last node of the first half of the split bucket 1910 1910 * that a join node follows. It must be clean/normal. 1911 1911 */ … … 1913 1913 = cas_link(wnd->ppred, wnd->cur, N_NORMAL, wnd->cur, N_JOIN_FOLLOWS); 1914 1914 1915 /* 1916 * Successfully marked as a JF node or already marked that way (even 1917 * if also marked deleted - unlinking the node will move the JF mark). 1915 /* 1916 * Successfully marked as a JF node or already marked that way (even 1917 * if also marked deleted - unlinking the node will move the JF mark). 1918 1918 */ 1919 1919 done = (ret == make_link(wnd->cur, N_NORMAL)) … … 1932 1932 mark_t mark = get_mark(join_node->link); 1933 1933 1934 /* 1935 * May already be marked as deleted, but it won't be unlinked 1934 /* 1935 * May already be marked as deleted, but it won't be unlinked 1936 1936 * because its predecessor is marked with JOIN_FOLLOWS or CONST. 1937 1937 */ 1938 marked_ptr_t ret 1938 marked_ptr_t ret 1939 1939 = cas_link(&join_node->link, next, mark, next, mark | N_JOIN); 1940 1940 … … 1946 1946 1947 1947 /** Appends the bucket at psrc_head to the bucket at pdest_head. 1948 * 1948 * 1949 1949 * @param h CHT to operate on. 1950 1950 * @param psrc_head Bucket to merge with pdest_head. … … 1952 1952 * @param split_hash The smallest hash psrc_head may contain. 1953 1953 */ 1954 static void join_buckets(cht_t *h, marked_ptr_t *psrc_head, 1954 static void join_buckets(cht_t *h, marked_ptr_t *psrc_head, 1955 1955 marked_ptr_t *pdest_head, size_t split_hash) 1956 1956 { … … 1959 1959 return; 1960 1960 /* 1961 * F == First node of psrc_head, ie the bucket we want to append 1961 * F == First node of psrc_head, ie the bucket we want to append 1962 1962 * to (ie join with) the bucket starting at pdest_head. 1963 1963 * L == Last node of pdest_head, ie the bucket that psrc_head will 1964 * be appended to. 1965 * 1966 * (1) We first mark psrc_head immutable to signal that a join is 1967 * in progress and so that updaters unaware of the join (or table 1964 * be appended to. 1965 * 1966 * (1) We first mark psrc_head immutable to signal that a join is 1967 * in progress and so that updaters unaware of the join (or table 1968 1968 * resize): 1969 1969 * - do not insert new nodes between the head psrc_head and F 1970 1970 * - do not unlink F (it may already be marked deleted) 1971 * 1971 * 1972 1972 * (2) Next, F is marked as a join node. Updaters aware of table resize 1973 * will not attempt to unlink it. We cannot safely/atomically unlink 1974 * the join node because it will be pointed to from two different 1973 * will not attempt to unlink it. We cannot safely/atomically unlink 1974 * the join node because it will be pointed to from two different 1975 1975 * buckets. Updaters unaware of resize will fail to unlink the join 1976 1976 * node due to the head being marked immutable. … … 1978 1978 * (3) Then the tail of the bucket at pdest_head is linked to the join 1979 1979 * node. From now on, nodes in both buckets can be found via pdest_head. 1980 * 1980 * 1981 1981 * (4) Last, mark immutable psrc_head as invalid. It signals updaters 1982 1982 * that the join is complete and they can insert new nodes (originally 1983 * destined for psrc_head) into pdest_head. 1984 * 1983 * destined for psrc_head) into pdest_head. 1984 * 1985 1985 * Note that pdest_head keeps pointing at the join node. This allows 1986 1986 * lookups and updaters to determine if they should see a link between 1987 1987 * the tail L and F when searching for nodes originally in psrc_head 1988 * via pdest_head. If they reach the tail of pdest_head without 1988 * via pdest_head. If they reach the tail of pdest_head without 1989 1989 * encountering any nodes of psrc_head, either there were no nodes 1990 1990 * in psrc_head to begin with or the link between L and F did not 1991 1991 * yet propagate to their cpus. If psrc_head was empty, it remains 1992 * NULL. Otherwise psrc_head points to a join node (it will not be 1992 * NULL. Otherwise psrc_head points to a join node (it will not be 1993 1993 * unlinked until table resize completes) and updaters/lookups 1994 1994 * should issue a read_barrier() to make the link [L]->[JN] visible. 1995 * 1996 * 0) ,-- split_hash, first hash of the src bucket 1997 * v 1995 * 1996 * 0) ,-- split_hash, first hash of the src bucket 1997 * v 1998 1998 * [dest_head | N]-> .. -> [L] 1999 * [src_head | N]--> [F] -> .. 1999 * [src_head | N]--> [F] -> .. 2000 2000 * ^ 2001 2001 * ` split_hash, first hash of the src bucket 2002 * 2002 * 2003 2003 * 1) ,-- split_hash 2004 * v 2004 * v 2005 2005 * [dest_head | N]-> .. -> [L] 2006 * [src_head | C]--> [F] -> .. 2007 * 2006 * [src_head | C]--> [F] -> .. 2007 * 2008 2008 * 2) ,-- split_hash 2009 * v 2009 * v 2010 2010 * [dest_head | N]-> .. -> [L] 2011 * [src_head | C]--> [JN] -> .. 2012 * 2011 * [src_head | C]--> [JN] -> .. 2012 * 2013 2013 * 3) ,-- split_hash 2014 * v 2014 * v 2015 2015 * [dest_head | N]-> .. -> [L] --+ 2016 2016 * v 2017 * [src_head | C]-------------> [JN] -> .. 2018 * 2017 * [src_head | C]-------------> [JN] -> .. 2018 * 2019 2019 * 4) ,-- split_hash 2020 * v 2020 * v 2021 2021 * [dest_head | N]-> .. -> [L] --+ 2022 2022 * v 2023 * [src_head | Inv]-----------> [JN] -> .. 2023 * [src_head | Inv]-----------> [JN] -> .. 2024 2024 */ 2025 2025 … … 2038 2038 link_to_join_node(h, pdest_head, join_node, split_hash); 2039 2039 cas_order_barrier(); 2040 } 2040 } 2041 2041 2042 2042 DBG(marked_ptr_t ret = ) … … 2049 2049 2050 2050 /** Links the tail of pdest_head to join_node. 2051 * 2051 * 2052 2052 * @param h CHT to operate on. 2053 2053 * @param pdest_head Head of the bucket whose tail is to be linked to join_node. … … 2057 2057 * (originally) in pdest_head. 2058 2058 */ 2059 static void link_to_join_node(cht_t *h, marked_ptr_t *pdest_head, 2059 static void link_to_join_node(cht_t *h, marked_ptr_t *pdest_head, 2060 2060 cht_link_t *join_node, size_t split_hash) 2061 2061 { … … 2077 2077 if (wnd.cur != &sentinel) { 2078 2078 /* Must be from the new appended bucket. */ 2079 assert(split_hash <= node_hash(h, wnd.cur) 2079 assert(split_hash <= node_hash(h, wnd.cur) 2080 2080 || h->invalid_hash == node_hash(h, wnd.cur)); 2081 2081 return; … … 2083 2083 2084 2084 /* Reached the tail of pdest_head - link it to the join node. */ 2085 marked_ptr_t ret = 2085 marked_ptr_t ret = 2086 2086 cas_link(wnd.ppred, &sentinel, N_NORMAL, join_node, N_NORMAL); 2087 2087 … … 2090 2090 } 2091 2091 2092 /** Instructs RCU to free the item once all preexisting references are dropped. 2093 * 2092 /** Instructs RCU to free the item once all preexisting references are dropped. 2093 * 2094 2094 * The item is freed via op->remove_callback(). 2095 2095 */ … … 2098 2098 assert(item != &sentinel); 2099 2099 2100 /* 2100 /* 2101 2101 * remove_callback only works as rcu_func_t because rcu_link is the first 2102 2102 * field in cht_link_t. … … 2108 2108 2109 2109 /** Notes that an item had been unlinked from the table and shrinks it if needed. 2110 * 2111 * If the number of items in the table drops below 1/4 of the maximum 2110 * 2111 * If the number of items in the table drops below 1/4 of the maximum 2112 2112 * allowed load the table is shrunk in the background. 2113 2113 */ … … 2129 2129 } 2130 2130 2131 /** Notes an item had been inserted and grows the table if needed. 2132 * 2131 /** Notes an item had been inserted and grows the table if needed. 2132 * 2133 2133 * The table is resized in the background. 2134 2134 */ … … 2192 2192 2193 2193 /* Failed to alloc a new table - try next time the resizer is run. */ 2194 if (!h->new_b) 2194 if (!h->new_b) 2195 2195 return; 2196 2196 … … 2199 2199 size_t old_bucket_cnt = (1 << h->b->order); 2200 2200 2201 /* 2202 * Give updaters a chance to help out with the resize. Do the minimum 2201 /* 2202 * Give updaters a chance to help out with the resize. Do the minimum 2203 2203 * work needed to announce a resize is in progress, ie start moving heads. 2204 2204 */ … … 2227 2227 } 2228 2228 2229 /* 2229 /* 2230 2230 * Wait for all updaters to notice the new heads. Once everyone sees 2231 2231 * the invalid old bucket heads they will know a resize is in progress 2232 * and updaters will modify the correct new buckets. 2232 * and updaters will modify the correct new buckets. 2233 2233 */ 2234 2234 rcu_synchronize(); … … 2241 2241 } 2242 2242 2243 /* 2243 /* 2244 2244 * Wait for everyone to notice that buckets were split, ie link connecting 2245 * the join follows and join node has been cut. 2245 * the join follows and join node has been cut. 2246 2246 */ 2247 2247 rcu_synchronize(); … … 2278 2278 2279 2279 /* Failed to alloc a new table - try next time the resizer is run. */ 2280 if (!h->new_b) 2280 if (!h->new_b) 2281 2281 return; 2282 2282 … … 2286 2286 size_t old_bucket_cnt = (1 << h->b->order); 2287 2287 2288 /* 2289 * Give updaters a chance to help out with the resize. Do the minimum 2288 /* 2289 * Give updaters a chance to help out with the resize. Do the minimum 2290 2290 * work needed to announce a resize is in progress, ie start moving heads. 2291 2291 */ … … 2318 2318 /* This bucket should join the moved bucket. */ 2319 2319 size_t split_hash = calc_split_hash(old_idx, h->b->order); 2320 join_buckets(h, &h->b->head[old_idx], &h->new_b->head[new_idx], 2320 join_buckets(h, &h->b->head[old_idx], &h->new_b->head[new_idx], 2321 2321 split_hash); 2322 2322 } 2323 2323 } 2324 2324 2325 /* 2325 /* 2326 2326 * Wait for all updaters to notice the new heads. Once everyone sees 2327 2327 * the invalid old bucket heads they will know a resize is in progress 2328 * and updaters will modify the correct new buckets. 2328 * and updaters will modify the correct new buckets. 2329 2329 */ 2330 2330 rcu_synchronize(); … … 2389 2389 2390 2390 /** Clears the join_node's N_JOIN mark frees it if marked N_DELETED as well. */ 2391 static void clear_join_and_gc(cht_t *h, cht_link_t *join_node, 2391 static void clear_join_and_gc(cht_t *h, cht_link_t *join_node, 2392 2392 marked_ptr_t *new_head) 2393 2393 { … … 2404 2404 mark_t cleared_mark = get_mark(jn_link) & N_DELETED; 2405 2405 2406 marked_ptr_t ret = 2406 marked_ptr_t ret = 2407 2407 _cas_link(&join_node->link, jn_link, make_link(next, cleared_mark)); 2408 2408 … … 2429 2429 }; 2430 2430 2431 done = find_wnd_and_gc_pred(h, jn_hash, WM_NORMAL, same_node_pred, 2431 done = find_wnd_and_gc_pred(h, jn_hash, WM_NORMAL, same_node_pred, 2432 2432 join_node, &wnd, &resizing); 2433 2433 … … 2452 2452 * Find the non-deleted node with a JF mark and clear the JF mark. 2453 2453 * The JF node may be deleted and/or the mark moved to its neighbors 2454 * at any time. Therefore, we GC deleted nodes until we find the JF 2455 * node in order to remove stale/deleted JF nodes left behind eg by 2456 * delayed threads that did not yet get a chance to unlink the deleted 2457 * JF node and move its mark. 2458 * 2454 * at any time. Therefore, we GC deleted nodes until we find the JF 2455 * node in order to remove stale/deleted JF nodes left behind eg by 2456 * delayed threads that did not yet get a chance to unlink the deleted 2457 * JF node and move its mark. 2458 * 2459 2459 * Note that the head may be marked JF (but never DELETED). 2460 2460 */ … … 2481 2481 if (is_jf_node) { 2482 2482 cht_link_t *next = get_next(*cur_link); 2483 marked_ptr_t ret = 2483 marked_ptr_t ret = 2484 2484 cas_link(cur_link, next, N_JOIN_FOLLOWS, &sentinel, N_NORMAL); 2485 2485 2486 assert(next == &sentinel 2486 assert(next == &sentinel 2487 2487 || ((N_JOIN | N_JOIN_FOLLOWS) & get_mark(ret))); 2488 2488 … … 2491 2491 break; 2492 2492 } else { 2493 /* 2494 * The JF node had been deleted or a new node inserted 2493 /* 2494 * The JF node had been deleted or a new node inserted 2495 2495 * right after it. Retry from the head. 2496 2496 */ … … 2500 2500 } else { 2501 2501 wnd.ppred = cur_link; 2502 wnd.cur = get_next(*cur_link); 2502 wnd.cur = get_next(*cur_link); 2503 2503 } 2504 2504 } … … 2512 2512 } 2513 2513 2514 /** Returns the first possible hash following a bucket split point. 2515 * 2514 /** Returns the first possible hash following a bucket split point. 2515 * 2516 2516 * In other words the returned hash is the smallest possible hash 2517 2517 * the remainder of the split bucket may contain. … … 2558 2558 static inline size_t node_hash(cht_t *h, const cht_link_t *item) 2559 2559 { 2560 assert(item->hash == h->invalid_hash 2560 assert(item->hash == h->invalid_hash 2561 2561 || item->hash == sentinel.hash 2562 2562 || item->hash == calc_node_hash(h, item)); … … 2569 2569 { 2570 2570 assert(item != &sentinel); 2571 /* 2571 /* 2572 2572 * Clear the lowest order bit in order for sentinel's node hash 2573 2573 * to be the greatest possible. … … 2624 2624 2625 2625 /** Compare-and-swaps a next item link. */ 2626 static inline marked_ptr_t cas_link(marked_ptr_t *link, const cht_link_t *cur_next, 2626 static inline marked_ptr_t cas_link(marked_ptr_t *link, const cht_link_t *cur_next, 2627 2627 mark_t cur_mark, const cht_link_t *new_next, mark_t new_mark) 2628 2628 { 2629 return _cas_link(link, make_link(cur_next, cur_mark), 2629 return _cas_link(link, make_link(cur_next, cur_mark), 2630 2630 make_link(new_next, new_mark)); 2631 2631 } 2632 2632 2633 2633 /** Compare-and-swaps a next item link. */ 2634 static inline marked_ptr_t _cas_link(marked_ptr_t *link, marked_ptr_t cur, 2634 static inline marked_ptr_t _cas_link(marked_ptr_t *link, marked_ptr_t cur, 2635 2635 marked_ptr_t new) 2636 2636 { … … 2640 2640 * have to be ordered wrt to other cas(y) to a different location y 2641 2641 * on the same cpu. 2642 * 2643 * cas(x) must act as a write barrier on x, ie if cas(x) succeeds 2644 * and is observed by another cpu, then all cpus must be able to 2642 * 2643 * cas(x) must act as a write barrier on x, ie if cas(x) succeeds 2644 * and is observed by another cpu, then all cpus must be able to 2645 2645 * make the effects of cas(x) visible just by issuing a load barrier. 2646 2646 * For example: 2647 2647 * cpu1 cpu2 cpu3 2648 * cas(x, 0 -> 1), succeeds 2648 * cas(x, 0 -> 1), succeeds 2649 2649 * cas(x, 0 -> 1), fails 2650 2650 * MB, to order load of x in cas and store to y … … 2652 2652 * sees y == 7 2653 2653 * loadMB must be enough to make cas(x) on cpu3 visible to cpu1, ie x == 1. 2654 * 2654 * 2655 2655 * If cas() did not work this way: 2656 2656 * a) our head move protocol would not be correct. 2657 2657 * b) freeing an item linked to a moved head after another item was 2658 2658 * inserted in front of it, would require more than one grace period. 2659 * 2659 * 2660 2660 * Ad (a): In the following example, cpu1 starts moving old_head 2661 2661 * to new_head, cpu2 completes the move and cpu3 notices cpu2 2662 2662 * completed the move before cpu1 gets a chance to notice cpu2 2663 * had already completed the move. Our requirements for cas() 2664 * assume cpu3 will see a valid and mutable value in new_head 2665 * after issuing a load memory barrier once it has determined 2666 * the old_head's value had been successfully moved to new_head 2663 * had already completed the move. Our requirements for cas() 2664 * assume cpu3 will see a valid and mutable value in new_head 2665 * after issuing a load memory barrier once it has determined 2666 * the old_head's value had been successfully moved to new_head 2667 2667 * (because it sees old_head marked invalid). 2668 * 2668 * 2669 2669 * cpu1 cpu2 cpu3 2670 2670 * cas(old_head, <addr, N>, <addr, Const>), succeeds … … 2672 2672 * // Move from old_head to new_head started, now the interesting stuff: 2673 2673 * cas(new_head, <0, Inv>, <addr, N>), succeeds 2674 * 2674 * 2675 2675 * cas(new_head, <0, Inv>, <addr, N>), but fails 2676 2676 * cas-order-barrier 2677 2677 * cas(old_head, <addr, Const>, <addr, Inv>), succeeds 2678 * 2678 * 2679 2679 * Sees old_head marked Inv (by cpu2) 2680 2680 * load-MB 2681 2681 * assert(new_head == <addr, N>) 2682 * 2682 * 2683 2683 * cas-order-barrier 2684 * 2684 * 2685 2685 * Even though cpu1 did not yet issue a cas-order-barrier, cpu1's store 2686 2686 * to new_head (successful cas()) must be made visible to cpu3 with 2687 2687 * a load memory barrier if cpu1's store to new_head is visible 2688 2688 * on another cpu (cpu2) and that cpu's (cpu2's) store to old_head 2689 * is already visible to cpu3. * 2689 * is already visible to cpu3. * 2690 2690 */ 2691 2691 void *expected = (void*)cur; 2692 2692 2693 /* 2693 /* 2694 2694 * Use the acquire-release model, although we could probably 2695 2695 * get away even with the relaxed memory model due to our use
Note:
See TracChangeset
for help on using the changeset viewer.