Opened 13 years ago

Closed 13 years ago

#342 closed defect (fixed)

mips32: Memory management lockup

Reported by: Martin Decky Owned by: Martin Decky
Priority: critical Milestone: 0.5.0
Component: helenos/kernel/mips32 Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

When running the malloc3 uspace test on mips32/GXemul (with 128 MB of physical memory), the phase 2/subphase 1 locks up after three operations. The system as a whole and other tasks are still running, it is not an out-of-memory situation. The main thread of the tester is sleeping in a wait queue in the kernel.

Initial investigation showed that the test has removed an address space area on address X and just after that created an address space area on the same address X in the heap allocator. However, so far I have been unable to exactly identify the reason of the sleep.

Hypothesis: Somehow (maybe due to some stale locks in the newly created address space area in place of the previously removed one) the deferred page fault gets blocked.

Suggestion: It would be very helpful if the kernel had some kconsole facility to identify the owners of synchronization primitives.

Attachments (1)

mips32_malloc3.png (19.6 KB ) - added by Martin Decky 13 years ago.
screenshot

Download all attachments as: .zip

Change History (7)

by Martin Decky, 13 years ago

Attachment: mips32_malloc3.png added

screenshot

in reply to:  description comment:1 by Jakub Jermář, 13 years ago

Replying to decky:

Suggestion: It would be very helpful if the kernel had some kconsole facility to identify the owners of synchronization primitives.

Theoretically, we could slightly change btrace to print the kernel stack of the sleeping thread and we should get a clue about the owner.

comment:2 by Jakub Jermář, 13 years ago

Looks like the thread is blocking on a futex:

0x81aa7e78: generic/src/synch/waitq.o:waitq_sleep_timeout_unsafe()+0x000000ac
0x81aa7ea8: generic/src/synch/waitq.o:waitq_sleep_timeout()+0x0000005c
0x81aa7ee8: generic/src/synch/futex.o:sys_futex_sleep()+0x000000cc
0x81aa7f10: generic/src/syscall/syscall.o:syscall_handler()+0x000000e0
0x81aa7f68: arch/mips32/src/start.o:syscall_shortcut()+0x00000038

comment:3 by Jakub Jermář, 13 years ago

Could the problem be that we hit an assert while holding the malloc_futex? In that case, the consequent attempts to allocate some memory (for e.g. printf()) will lead to deadlock, wouldn't they?

comment:4 by Jakub Jermář, 13 years ago

Indeed - after commenting out all futexes in malloc.h, malloc3 test aborts due to a failed assertion:

static void area_check(void *addr)
{
        heap_area_t *area = (heap_area_t *) addr;

        assert(area->magic == HEAP_AREA_MAGIC);

comment:5 by Martin Decky, 13 years ago

OK, so this is basically the same assertion as in ticket #337 on ppc32.

I am already investigating this. It is probably not a bug in the uspace heap allocator itself. One promising lead is that it has something to do with shared memory areas (due to IPC), but I haven't been able to verify this yet.

comment:6 by Jakub Jermář, 13 years ago

Resolution: fixed
Status: newclosed

Fixed the hang in changeset:mainline,974.

The failed assert is still there, but that is a different story.

Note: See TracTickets for help on using tickets.