Fork us on GitHub Follow us on Google+ Follow us on Facebook Follow us on Twitter

Opened 8 years ago

Closed 8 years ago

#342 closed defect (fixed)

mips32: Memory management lockup

Reported by: Martin Decky Owned by: Martin Decky
Priority: critical Milestone: 0.5.0
Component: helenos/kernel/mips32 Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

When running the malloc3 uspace test on mips32/GXemul (with 128 MB of physical memory), the phase 2/subphase 1 locks up after three operations. The system as a whole and other tasks are still running, it is not an out-of-memory situation. The main thread of the tester is sleeping in a wait queue in the kernel.

Initial investigation showed that the test has removed an address space area on address X and just after that created an address space area on the same address X in the heap allocator. However, so far I have been unable to exactly identify the reason of the sleep.

Hypothesis: Somehow (maybe due to some stale locks in the newly created address space area in place of the previously removed one) the deferred page fault gets blocked.

Suggestion: It would be very helpful if the kernel had some kconsole facility to identify the owners of synchronization primitives.

Attachments (1)

mips32_malloc3.png (19.6 KB) - added by Martin Decky 8 years ago.
screenshot

Download all attachments as: .zip

Change History (7)

Changed 8 years ago by Martin Decky

Attachment: mips32_malloc3.png added

screenshot

comment:1 in reply to:  description Changed 8 years ago by Jakub Jermář

Replying to decky:

Suggestion: It would be very helpful if the kernel had some kconsole facility to identify the owners of synchronization primitives.

Theoretically, we could slightly change btrace to print the kernel stack of the sleeping thread and we should get a clue about the owner.

comment:2 Changed 8 years ago by Jakub Jermář

Looks like the thread is blocking on a futex:

0x81aa7e78: generic/src/synch/waitq.o:waitq_sleep_timeout_unsafe()+0x000000ac
0x81aa7ea8: generic/src/synch/waitq.o:waitq_sleep_timeout()+0x0000005c
0x81aa7ee8: generic/src/synch/futex.o:sys_futex_sleep()+0x000000cc
0x81aa7f10: generic/src/syscall/syscall.o:syscall_handler()+0x000000e0
0x81aa7f68: arch/mips32/src/start.o:syscall_shortcut()+0x00000038

comment:3 Changed 8 years ago by Jakub Jermář

Could the problem be that we hit an assert while holding the malloc_futex? In that case, the consequent attempts to allocate some memory (for e.g. printf()) will lead to deadlock, wouldn't they?

comment:4 Changed 8 years ago by Jakub Jermář

Indeed - after commenting out all futexes in malloc.h, malloc3 test aborts due to a failed assertion:

static void area_check(void *addr)
{
        heap_area_t *area = (heap_area_t *) addr;

        assert(area->magic == HEAP_AREA_MAGIC);

comment:5 Changed 8 years ago by Martin Decky

OK, so this is basically the same assertion as in ticket #337 on ppc32.

I am already investigating this. It is probably not a bug in the uspace heap allocator itself. One promising lead is that it has something to do with shared memory areas (due to IPC), but I haven't been able to verify this yet.

comment:6 Changed 8 years ago by Jakub Jermář

Resolution: fixed
Status: newclosed

Fixed the hang in changeset:mainline,974.

The failed assert is still there, but that is a different story.

Note: See TracTickets for help on using tickets.