Chapter 4. Time Management

Table of Contents

4.1. System Clock
4.2. Timeouts
4.3. Generic Clock Interrupt Handler
4.4. Time Source for Userspace

Time is one of the dimensions in which kernel as well as the whole system operates. It is of special importance to many kernel subsytems. Knowledge of time makes it possible for the scheduler to preemptively plan threads for execution. Different parts of the kernel can request execution of their callback function with a specified delay. A good example of such kernel code is the synchronization subsystem which uses this functionality to implement timeouting versions of synchronization primitives.

4.1. System Clock

Every hardware architecture supported by HelenOS must support some kind of a device that can be programmed to yield periodic time signals (i.e. clock interrupts). Some architectures have external clock that is merely programmed by the kernel to interrupt the processor multiple times in a second. This is the case of ia32 and amd64 architectures[4], which use i8254 or a compatible chip to achieve the goal.

Other architectures' processors typically contain two registers. The first register is usually called a compare or a match register and can be set to an arbitrary value by the operating system. The contents of the compare register then stays unaltered until it is written by the kernel again. The second register, often called a counter register, can be also written by the kernel, but the processor automatically increments it after every executed instruction or in some fixed relation to processor speed. The point is that a clock interrupt is generated whenever the values of the counter and the compare registers match. Sometimes, the scheme of two registers is modified so that only one register is needed. Such a register, called a decrementer, then counts towards zero and an interrupt is generated when zero is reached.

In any case, the initial value of the decrementer or the initial difference between the counter and the compare registers, respectively, must be set accordingly to a known relation between the real time and the speed of the decrementer or the counter register, respectively.

The rest of this section will, for the sake of clarity, focus on the two-register scheme. The decrementer scheme is very similar.

The kernel must reinitialize one of the two registers after each clock interrupt in order to schedule next interrupt. However this step is tricky and must be done with caution. Imagine that the clock interrupt is masked either because the kernel is servicing another interrupt or because the processor locally disabled interrupts for a while. If the clock interrupt occurs during this period, it will be pending until the interrupts are enabled again. Theoretically, it could happen an arbitrary counter register ticks later. Which is worse, the ideal time period between two non-delayed clock interrupts can also elapse arbitrary number of times before the delayed interrupt gets serviced. The architecture-specific part of the clock interrupt driver must avoid time drifts caused by such behaviour by taking proactive counter-measures.

Let us assume that the kernel wants each clock interrupt be generated every TICKCONST ticks. This value represents the ideal number of ticks between two non-delayed clock interrupts and has some known relation to real time. On each clock interrupt, the kernel computes and writes down the expected value of the counter register as it hopes to read it on the next clock interrupt. When that interrupt comes, the kernel reads the counter register again and compares it with the written down value. If the difference is smaller than or equal to TICKCONST, then the time drift is none or small and the next interrupt is scheduled earlier with a penalty of so many ticks as is the value of the difference. However, if the difference is bigger, then at least one clock signal was missed. In that case, the missed clock signal is remembered in the special counter. If there are more missed signals, each of them is recorded there. The next interrupt is scheduled with respect to the difference similarily to the former case. This time, the penalty is taken modulo TICKCONST. The effect of missed clock signals is remedied in the generic clock interrupt handler.

4.2. Timeouts

Kernel subsystems can register a callback function to be executed with a specified delay. Such a registration is represented by a kernel structure called timeout. Timeouts are registered via timeout_register function. This function takes a pointer to a timeout structure, a callback function, a parameter of the callback function and a delay in microseconds as parameters. After the structure is initialized with all these values, it is sorted into the processor's list of active timeouts, according to the number of clock interrupts remaining to their expiration and relatively to already listed timeouts.

Timeouts can be unregistered via timeout_unregister. This function can, as opposed to timeout_register, fail when it is too late to remove the timeout from the list of active timeouts.

Timeouts are nearing their expiration in the list of active timeouts which exists on every processor in the system. The expiration counters are decremented on each clock interrupt by the generic clock interrupt handler. Due to the relative ordering of timeouts in the list, it is sufficient to decrement expiration counter only of the first timeout in the list. Timeouts with expiration counter equal to zero are removed from the list and their callback function is called with respective parameter.

4.3. Generic Clock Interrupt Handler

On each clock interrupt, the architecture specific part of the clock interrupt handler makes a call to the generic clock interrupt handler implemented by the clock function. The generic handler takes care of several mission critical goals:

  • expiration of timeouts,

  • updating time of the day counters for userspace and

  • preemption of threads.

The clock function checks for expired timeouts and decrements unexpired timeout expiration counters exactly one more times than is the number of missed clock signals (i.e. at least once and possibly more times, depending on the missed clock signals counter). The time of the day counters are also updated one more times than is the number of missed clock signals. And finally, the remaining timeslice of the running thread is decremented with respect to this counter as well. By considering its value, the kernel performs actions that would otherwise be lost due to an occasional excessive time drift described in previous paragraphs.

4.4. Time Source for Userspace

In HelenOS, userspace tasks don't communicate with the kernel in order to read the system time. Instead, a mechanism that shares kernel time of the the day counters with userspace address spaces is deployed. On the kernel side, during system initialization, HelenOS allocates a frame of physical memory and stores the time of the day counters there. The counters have the following structure:

  • first 32-bit counter for seconds,

  • 32-bit counter for microseconds and

  • second 32-bit counter for seconds.

One of the userspace tasks with capabilities of memory manager (e.g. ns) asks the kernel to map this frame into its address space. Other non-privileged tasks then use IPC to receive read-only share of this memory. Reading time in a userspace task is therefore just a matter of reading memory.

There are two interesting points about this. First, the counters are 32-bit even on 64-bit machines. The goal is to provide subsecond precision with the possibility to span roughly 136 years. Note that a single 64-bit microsecond counter could not be usually read atomically on 32-bit platforms. Unfortunately, on 32-bit platforms it is usually impossible to read atomically two 32-bit counters either. However, a generic protocol is used to guarantee that sequentially read times will create a non-decreasing sequence.

The problematic part is incrementing seconds counter and clearing microseconds counter together once every second. Seconds must be incremented and microseconds must be reset. However, without any synchronization, the two kernel stores and the two userspace reads can arbitrarily interleave. Furthemore, the reader has no chance to detect that the counters were updated only paritally. Therefore three counters are used in HelenOS.

If seconds need to be updated, the kernel increments the first second counter, issues a write memory barrier operation, updates the microsecond counter, issues another write memory barrier operation and increments the second second counter. When only microseconds needs to be updated, no special action is taken by the kernel. On the other hand, the userspace task must always read all three counters in reversed order. A read memory barrier operation must be issued between each two reads. A non-atomic read is detected when the two second counters differ. The userspace library solves this situation by returning higher of them with microseconds set to zero.

[4] When running in uniprocessor mode.