Fork us on GitHub Follow us on Facebook Follow us on Twitter

Opened 7 years ago

Closed 7 years ago

#458 closed defect (fixed)

Deadlocks when memory management is under pressure

Reported by: Jakub Jermář Owned by: Jakub Jermář
Priority: major Milestone: 0.5.0
Component: helenos/kernel/generic Version: mainline
Keywords: mm Cc:
Blocker for: Depends on:
See also: #445

Description

As of mainline,1486, running tester malloc1 and kconsole's test *, or two instances of tester malloc1 on a SMP system, may deadlock the kernel in various ways.

One such deadlock is depicted on the attached picture (courtesy of Maurizio).

Some of these deadlocks are caused by a TLB shootdown sequence spinning on some non-IRQ-spinlock which is held by another CPU interrupted by the TLB shootdown IPI.

Other deadlocks do not seem to be related to TLB shootdown, but are most likely related to #445 and the fact the system is running low on memory.

Some deadlocks are not even fully reported in the current mainline because they involve mutexes and possibly other synchronization primitives.

Attachments (1)

spinlock_loop.png (17.9 KB) - added by Jakub Jermář 7 years ago.

Download all attachments as: .zip

Change History (5)

Changed 7 years ago by Jakub Jermář

Attachment: spinlock_loop.png added

comment:1 Changed 7 years ago by Jakub Jermář

Status: newaccepted

comment:2 Changed 7 years ago by Jakub Jermář

In mainline,1489, I merged a couple of fixes that ensure that the TLB shootdown sequences will not spin on any range, slab or frame allocator lock. Let us see if there are any other memory management related deadlocks now.

comment:3 Changed 7 years ago by Jakub Jermář

So far, after mainline,1489, I only noticed hangs and deadlocks of the following kind:

  • exception → CPU lock acquired → FPU lazy context switch → page fault → deadlock on CPU lock acquired
  • two instances of tester malloc1 in combination with kconsole's test *; hung shortly after printing the message about waiting for N frames; this could actually happen as a result of the two testers reserving memory and the kconsole allocating that memory (kernel never reserves it in advance); it is then sufficient if the kernel uses any sort of blocking memory allocation (e.g. as part of syscall processing) and there will be no forward progress

The former suggests there is some problem with maintaining the FPU context (wrong or corrupted thread pointer).

The latter suggests there may still be some blocking allocations for secondary structures (i.e. other than user pages) in the syscall callpaths; some of the kernel tests may use blocking allocations too. The syscall paths need to be cleaned up. Running uspace tests together with kernel tests that both try to allocate as much memory as possible is a bad idea and since both parties may block, it may inherently lead to this kind of pathological behavior.

If no other issues are reported soon, I will be inclined to close this ticket as fixed because the TLB-related deadlocks don't seem to be reproducible after mainline,1489.

comment:4 Changed 7 years ago by Jakub Jermář

Resolution: fixed
Status: acceptedclosed

Ok, closing as fixed. Please file a new ticket if a new issue occurs.

Note: See TracTickets for help on using tickets.