Changes between Initial Version and Version 9 of Ticket #396


Ignore:
Timestamp:
2012-06-20T23:59:53Z (12 years ago)
Author:
Jakub Jermář
Comment:

Ok, I think I can explain the mechanism of this corruption, at last!

Threads that have an active FPU context (fpu_context_engaged is true) may not be migrated. The kcpulb load balancing thread honours this, but, as it turns out, this is not enough to prevent a thread with an active FPU context from being migrated to another CPU. Imagine a thread with its FPU context on the CPU goes to sleep. The CPU which wakes up the thread again (for whatever reason) will put it into its own run queue and effectively migrate the thread. Since the lazy FPU context switching code relies on the fact that the FPU context owner will not be migrated under any circumstances, this is a severe flaw of our thread_ready() function.

Now, as for what is probably happening, let us assume that the tester malloc1 thread was running on CPU1 for some time. Its saved_fpu_context contains the last saved version of its FPU context. The saved FPU context becomes obsolete as soon as the thread runs again. The current version of the FPU context exists on CPU1, the last CPU which executed the thread. Now let us further assume that the thread got blocked in the kernel on e.g. some mutex. CPU2 later wakes up our thread, which effectively migrates it to CPU2, thanks to an error in thread_ready(). When the thread executes the first FPU instruction in fill_block(), it triggers an exception on CPU2, which results in a call to scheduler_fpu_lazy_request(). In that function, the FPU on CPU2 loads the thread's saved FPU context, which is now obsolete (but may be quite similar to the current one). The current FPU context for the thread is still on CPU1. When the thread resumes execution of userspace code, it works with stale FPU context. Hadn't the thread been migrated, the behavior would be correct because the on-CPU FPU state would be first saved in the thread's saved_fpu_context, so that scheduler_fpu_lazy_request() will load the correct context a little bit later.

Note that this could have also negatively impacted destroying of threads, because thread_destroy() checks only the current CPU for remembering the FPU owner thread. Running this code on a wrong CPU defeats the algorithm and has the potential to cause further kernel memory corruptions.

Fixed in mainline,1532.

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #396

    • Property Component helenos/unspecifiedhelenos/uspace/libc
    • Property Keywords malloc amd64 fpu added
    • Property Owner set to Jakub Jermář
    • Property Status newclosed
    • Property Resolutionfixed