Opened 13 years ago

#433 new enhancement

Restartable system calls

Reported by: Jiri Svoboda Owned by: Jakub Jermář
Priority: major Milestone:
Component: helenos/kernel/generic Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

There are several scenarios where we want to examine or set/modify the state of a thread, including checkpoint and resume, migration and debugging. There are certain points in the thread's execution where its state is simple and well-known to the system, such as when crossing the boundary between user and kernel space.

However, some system calls can block indefinitely and we want to be able to access the thread state upon user request, within some reasonable time limit (e.g. as soon as the alloted time quantum expires, within one clock tick, etc).

In general the state of a thread running in the kernel is more complex and could be described by the contents of the kernel stack and current register state. This state representation is much larger and is not portable between system restarts and different system nodes (in the hypothetical scenario of migrating between cluster nodes).

Fortunately, SPARTAN system calls have relatively simple behavior. Most of them simply perform some action (side effect), but do not block indefinitely under normal conditions. For such system calls we can simply wait until they complete.

Most system calls that are designed to sleep (thread_usleep, thread_udelay, futex_sleep, ipc_wait) will only have some visible side effect after they finish the sleep. Therefore they could be modified to be abortable/restartable. This would mean adding an extra return status to the sleeping primitives being used and adjusting the functions in the call stack to handle the condition and unroll the operation.

The system calls that might cause problems are ipc_call_sync_{fast|slow}. These first send a message and then wait for answer. Either these need to be eliminated (if they are not really needed) or some special provisions must be made for handling these.

This enhancement would simplify Udebug - it would no longer have need for 'stoppable' sections - we could restart the system calls instead and stop at a specific point in the syscall code. It would also allow a simpler and more robust implementation of thread checkpoint/resume (compared to the implementation in the thesis Task snapshotting in HelenOS) - one that would allow safe checkpoint/resume across different kernel revisions and different cluster nodes.

An alternate approach / one possibility for solving the problem of synchronous IPC calls would be to have a few extra, well-defined CPR points in the kernel. This does not seem as elegant as the solution proposed above, though.

Change History (0)

Note: See TracTickets for help on using tickets.