| | 100 | First, we examine the overhead of rcu read sections compared to acquiring a spinlock. |
| | 101 | The figure above shows the number of traversals of a five element immutable list |
| | 102 | depending on the number of threads/cpus used. More is better ;-). |
| | 103 | - //ideal// - the list was accessed without any synchronization whatsoever |
| | 104 | - //a-rcu// - each list traversal was protected by A-RCU |
| | 105 | - //podzimek-rcu// - protected by the preemptible modification of Podzimek's RCU |
| | 106 | - //spinlock// - guarded by an ordinary preemption disabling spinlock |
| | 107 | |
| | 108 | A-RCU fares the best and has optimal scaling. On the other side, spinlock presents |
| | 109 | negative scaling (ie the more cpus you throw at it, the slower it is). Podzimek's RCU |
| | 110 | scales perfectly but has a greater base cost compared to A-RCU. In particular, |
| | 111 | Podzimek-RCU's base cost is on par with a spinlock when running on a single cpu while |
| | 112 | A-RCU's base cost is significantly lower than both Podzimek-RCU's and spinlock base cost. |
| | 113 | |
| | 114 | To reproduce these results, switch to the kernel console and run: |
| | 115 | {{{ |
| | 116 | chtbench 2 1 0 -w |
| | 117 | chtbench 2 2 0 -w |
| | 118 | chtbench 2 3 0 -w |
| | 119 | chtbench 2 4 0 -w |
| | 120 | chtbench 3 1 0 -w |
| | 121 | chtbench 3 2 0 -w |
| | 122 | chtbench 3 3 0 -w |
| | 123 | chtbench 3 4 0 -w |
| | 124 | chtbench 4 1 0 -w |
| | 125 | chtbench 4 2 0 -w |
| | 126 | chtbench 4 3 0 -w |
| | 127 | chtbench 4 4 0 -w |
| | 128 | }}} |
| | 129 | |
| | 130 | [[Image(r1589-list-upd.png)]] |
| | 131 | [[Image(r1589-list-upd-trim.png)]] |
| | 132 | [[Image(r1589-ht-lookup.png)]] |
| | 133 | [[Image(r1589-ht-upd.png)]] |