Merge scheduler refactoring to remove the need for thread structure lock
All necessary synchronization is already a product of other operations
that enforce ordering (that is, runqueue manipulation and thread_sleep()
/thread_wakeup()). Some fields formally become atomic, which is only
needed because they are read from other threads to print out statistics.
These atomic operations are limited to relaxed individual reads/writes
to native-sized fields, which should at least in theory be compiled
identically to regular volatile variable accesses, the only difference
being that concurrent accesses from different threads are not undefined
behavior by definition.
Additionally, it is now made possible to switch directly to new thread
context instead of going through a separate scheduler stack. A separate
context is only needed and used when no runnable threads is immediately
available, which means we optimize switching in the limiting case where
many threads are waiting for execution. Switching is also avoided
altogether when there's only one runnable thread and it is being
preempted. Originally, the scheduler would switch to a separate stack,
requeue the thread that was running, retrieve that same thread from
queue, and switch to it again, all that work is now avoided.