Opened 12 years ago

Last modified 5 years ago

#414 new enhancement

Graceful system shutdown

Reported by: Jakub Jermář Owned by: Jakub Jermář
Priority: major Milestone:
Component: helenos-infrastructure Version: mainline
Keywords: gsoc12, gsoc13, gsoc14 Cc:
Blocker for: Depends on:
See also:

Description (last modified by Martin Decky)

Design and implement graceful shutdown of HelenOS.

Details
The current support for shutdown in HelenOS is rather minimal. It is possible to halt the CPUs or reboot the machine. What is missing is a graceful way to tell running tasks that the system is about to be shut down. For example, the reboot sequence now consists of forceful kill of all existing tasks. The goal is to design and implement way to notify tasks of imminent shutdown (reboot) to allow them terminate in a clean and consistent way. The design decisions must reflect microkernel-specific issues, such as order of shutdown of vital services (e. g. VFS, naming service or drivers).
What Gains and Benefits will this bring?
The benefits of this task come at rather low level but are nevertheless very important. Graceful shutdown means that drivers could terminate communication with hardware in a predictable manner or that filesystem servers would be able to unmount file systems cleanly.
Difficulty
Medium to High. The solution will require work both in kernel and in userspace.
Required skills
A successful applicant will have good skills of programming in the C languages and the ability to survive in a non-standard non-POSIX application environment.
Documentation
Possible mentors
HelenOS Core Team, Vojtech Horky

Change History (13)

comment:1 by Jakub Jermář, 12 years ago

Keywords: gsoc12 added; 2012 removed

comment:2 by Jakub Jermář, 12 years ago

Keywords: needswork added

comment:3 by Vojtech Horky, 12 years ago

Description: modified (diff)

comment:4 by Jakub Jermář, 12 years ago

Component: helenos/unspecifiedhelenos/kernel/generic
Owner: set to Jakub Jermář

comment:5 by Vojtech Horky, 12 years ago

Description: modified (diff)
Keywords: needswork removed

comment:6 by Jiri Svoboda, 12 years ago

The key point here is determining which parts of the system need to be shut down and in which order. The strategy could be static or highly dynamic, based on known run-time service inter-dependencies. This process is, to a degree, similar to a reverse of the boot.

Random observations:

  • User tasks (tasks created as part of a login session, if we had one) need to be terminated
  • User tasks need to be given a chance to terminate gracefully
  • File systems need to be unmounted or re-mounted read-only (useful especially for the root fs where letting go of it / coping with a forced unmount might be difficult)
  • The dependencies are often complex, circular and dynamic. Rather than designing ad-hoc approaches (e.g. order of unmounting file systems), it might be more clever to put servers into a shutdown mode, where any service that is not busy (in use) is torn down. That may free other service(s) from use and they can be shut down, etc.

comment:7 by Jakub Jermář, 11 years ago

Keywords: gsoc13 added

comment:8 by Martin Decky, 11 years ago

Component: helenos/kernel/generichelenos-infrastructure

comment:9 by Martin Decky, 11 years ago

Description: modified (diff)

comment:10 by Vojtech Horky, 10 years ago

Keywords: gsoc14 added

comment:11 by Jiri Svoboda, 5 years ago

So far this wasn't very important. Now with the introduction of actual persistent file system (/w) it is really important to have at least an interface to shutting down the system. Even if the implementation is a simple as doing a 'vol eject /w'.

comment:12 by Jiri Svoboda, 5 years ago

Currently, even after ejecting the system volume, rebooting via the kernel console is not safe. If there is a device in the system, such as USB, that has no kernel driver, when the kernel kills the userspace driver, the system can get caught up in an endless loop of unhandled USB interrupts (i.e. printing spurious interrupt messages).

comment:13 by Jiri Svoboda, 5 years ago

We need to handle both file systems and device drivers. For file systems we need to do one of:

  • regular unmount
  • force unmount
  • remount read-only

similarly, devices can be handled in one of three ways:

  • regular detach
  • force detach
  • quiesce

Not sure which approach is the best. The regular unmount/detach has the additional complication that you need to take care of all the consumers first - they need to be made to release the resources willingly (e.g. shutting down system services) or forcibly (e.g. killing user sessions).

The remounte RO/quiesce needs additional work in the driver/FS, but no action needed on the client side.

Note: See TracTickets for help on using tickets.