| 1 | = Scalable Concurrent Hash Table = |
| 2 | * GSoC 2012, ticket #398 |
| 3 | * mentor: Jakub Jermar |
| 4 | * student: Adam Hraska |
| 5 | * repository: lp:~adam-hraska+lp/helenos/rcu/ |
| 6 | * benchmarking repository: lp:~adam-hraska+lp/helenos/cht-bench/ |
| 7 | |
| 8 | The goal of the project is to create a scalable concurrent resizable hash table (CHT): |
| 9 | * scalable = performance improves with added cpus |
| 10 | * concurrent = the table allows parallel access from multiple threads/cpus preferably without blocking |
| 11 | * resizable = the table grows and shrinks automatically as item are added or removed from it |
| 12 | Moreover, the table must be suitable to replace the current global page hash table in the kernel. |
| 13 | Optionally, it may be ported to user space as well. |
| 14 | |
| 15 | == Current Status == |
| 16 | {{{#!comment |
| 17 | |
| 18 | supported platforms, what needs to be implemented in others (atomics, ipis vs config_smp) |
| 19 | NMI safe or not |
| 20 | todos |
| 21 | }}} |
| 22 | |
| 23 | Todos: |
| 24 | * |
| 25 | |
| 26 | == Implementation == |
| 27 | === CHT Algorithm === |
| 28 | === RCU Algorithms === |
| 29 | |
| 30 | == Testing == |
| 31 | The CHT replies on working atomic operations, SMP calls (ie IPIs), RCU and the system work queue. |
| 32 | In order to test these out, pull and build the source code (eg on x86-64 machines): |
| 33 | {{{ |
| 34 | bzr checkout lp:~adam-hraska+lp/helenos/rcu/ |
| 35 | make PROFILE=amd64 |
| 36 | }}} |
| 37 | Boot HelenOS with eg 4 CPUs in QEMU: |
| 38 | {{{ |
| 39 | qemu-system-x86_64 -cdrom image.iso -smp4 |
| 40 | }}} |
| 41 | Once fully operational, switch to the kernel console by pressing F12 |
| 42 | (or typing kcon in the shell). You can then finally run the following |
| 43 | relevant tests: |
| 44 | {{{ |
| 45 | test atomic1 |
| 46 | test smpcall1 |
| 47 | test workqueue |
| 48 | test rcu1 |
| 49 | test cht |
| 50 | }}} |
| 51 | Neither of the tests should panic, crash, leak memory, loop infinitely, or report errors. |
| 52 | The only test //workqueue// reports an error which is however expected and marked appropriately |
| 53 | with line //Stress testing a custom queue. Stops prematurely. Errors are expected.// |
| 54 | |
| 55 | You may be interested in runtime statistics gathered by RCU or the system work queue. |
| 56 | Run these commands in the kernel console: |
| 57 | {{{ |
| 58 | rcu |
| 59 | workq |
| 60 | }}} |
| 61 | |
| 62 | == Benchmarks == |
| 63 | In order to benchmark the CHT or RCU implementation, pull sources of the benchmarking |
| 64 | branch and build the code in a release build (ie clear the debug build option): |
| 65 | {{{ |
| 66 | bzr checkout lp:~adam-hraska+lp/helenos/cht-bench/ |
| 67 | make PROFILE=amd64 |
| 68 | make config # this is where you have to clear the debug build option |
| 69 | make |
| 70 | }}} |
| 71 | Boot HelenOS in QEMU with eq 4 CPUS and list the available benchmarks in the kernel console with the |
| 72 | //chtbench// command: |
| 73 | {{{ |
| 74 | qemu-system-x86_64 -cdrom image.iso -smp 4 |
| 75 | |
| 76 | / # kcon |
| 77 | kconsole> chtbench 0 0 0 |
| 78 | }}} |
| 79 | |
| 80 | === Results === |
| 81 | The following results were obtained by benchmarking [http://bazaar.launchpad.net/~adam-hraska+lp/helenos/cht-bench/revision/1589 r1589] |
| 82 | on Core(TM) i7 CPU 920 @ 2.67GHz (4 cores, each with 2 !HyperThreads), running |
| 83 | a release build (with networking): |
| 84 | {{{contrib/conf/net-qe.sh e1k -smp 4 }}} |
| 85 | Each benchmark run consists of repeating a certain operation in a loop for 10 seconds |
| 86 | on the specified number of cpus (ie individual threads wired to these cpus). |
| 87 | Benchmarks were repeated as many times as needed (between 10 and 20 consecutive |
| 88 | runs) in order to obtain a confidence interval of the number of completed operations |
| 89 | within about 1% of the average number of completed operations of the runs. The following |
| 90 | figures include error bars depicting a single standard deviation of the completed number |
| 91 | of operations in a single run (ie it is //not// the SD of the mean, aka standard error). |
| 92 | However the estimated SD is typically below 1-2% of the mean (ie tiny) and therefore not |
| 93 | visible in the figures without zooming ;-). |