| 100 | First, we examine the overhead of rcu read sections compared to acquiring a spinlock. |
| 101 | The figure above shows the number of traversals of a five element immutable list |
| 102 | depending on the number of threads/cpus used. More is better ;-). |
| 103 | - //ideal// - the list was accessed without any synchronization whatsoever |
| 104 | - //a-rcu// - each list traversal was protected by A-RCU |
| 105 | - //podzimek-rcu// - protected by the preemptible modification of Podzimek's RCU |
| 106 | - //spinlock// - guarded by an ordinary preemption disabling spinlock |
| 107 | |
| 108 | A-RCU fares the best and has optimal scaling. On the other side, spinlock presents |
| 109 | negative scaling (ie the more cpus you throw at it, the slower it is). Podzimek's RCU |
| 110 | scales perfectly but has a greater base cost compared to A-RCU. In particular, |
| 111 | Podzimek-RCU's base cost is on par with a spinlock when running on a single cpu while |
| 112 | A-RCU's base cost is significantly lower than both Podzimek-RCU's and spinlock base cost. |
| 113 | |
| 114 | To reproduce these results, switch to the kernel console and run: |
| 115 | {{{ |
| 116 | chtbench 2 1 0 -w |
| 117 | chtbench 2 2 0 -w |
| 118 | chtbench 2 3 0 -w |
| 119 | chtbench 2 4 0 -w |
| 120 | chtbench 3 1 0 -w |
| 121 | chtbench 3 2 0 -w |
| 122 | chtbench 3 3 0 -w |
| 123 | chtbench 3 4 0 -w |
| 124 | chtbench 4 1 0 -w |
| 125 | chtbench 4 2 0 -w |
| 126 | chtbench 4 3 0 -w |
| 127 | chtbench 4 4 0 -w |
| 128 | }}} |
| 129 | |
| 130 | [[Image(r1589-list-upd.png)]] |
| 131 | [[Image(r1589-list-upd-trim.png)]] |
| 132 | [[Image(r1589-ht-lookup.png)]] |
| 133 | [[Image(r1589-ht-upd.png)]] |