Opened 13 years ago
Closed 13 years ago
#342 closed defect (fixed)
mips32: Memory management lockup
Reported by: | Martin Decky | Owned by: | Martin Decky |
---|---|---|---|
Priority: | critical | Milestone: | 0.5.0 |
Component: | helenos/kernel/mips32 | Version: | mainline |
Keywords: | Cc: | ||
Blocker for: | Depends on: | ||
See also: |
Description
When running the malloc3 uspace test on mips32/GXemul (with 128 MB of physical memory), the phase 2/subphase 1 locks up after three operations. The system as a whole and other tasks are still running, it is not an out-of-memory situation. The main thread of the tester is sleeping in a wait queue in the kernel.
Initial investigation showed that the test has removed an address space area on address X and just after that created an address space area on the same address X in the heap allocator. However, so far I have been unable to exactly identify the reason of the sleep.
Hypothesis: Somehow (maybe due to some stale locks in the newly created address space area in place of the previously removed one) the deferred page fault gets blocked.
Suggestion: It would be very helpful if the kernel had some kconsole facility to identify the owners of synchronization primitives.
Attachments (1)
Change History (7)
by , 13 years ago
Attachment: | mips32_malloc3.png added |
---|
comment:1 by , 13 years ago
Replying to decky:
Suggestion: It would be very helpful if the kernel had some kconsole facility to identify the owners of synchronization primitives.
Theoretically, we could slightly change btrace to print the kernel stack of the sleeping thread and we should get a clue about the owner.
comment:2 by , 13 years ago
Looks like the thread is blocking on a futex:
0x81aa7e78: generic/src/synch/waitq.o:waitq_sleep_timeout_unsafe()+0x000000ac 0x81aa7ea8: generic/src/synch/waitq.o:waitq_sleep_timeout()+0x0000005c 0x81aa7ee8: generic/src/synch/futex.o:sys_futex_sleep()+0x000000cc 0x81aa7f10: generic/src/syscall/syscall.o:syscall_handler()+0x000000e0 0x81aa7f68: arch/mips32/src/start.o:syscall_shortcut()+0x00000038
comment:3 by , 13 years ago
Could the problem be that we hit an assert while holding the malloc_futex? In that case, the consequent attempts to allocate some memory (for e.g. printf()) will lead to deadlock, wouldn't they?
comment:4 by , 13 years ago
Indeed - after commenting out all futexes in malloc.h, malloc3 test aborts due to a failed assertion:
static void area_check(void *addr) { heap_area_t *area = (heap_area_t *) addr; assert(area->magic == HEAP_AREA_MAGIC);
comment:5 by , 13 years ago
OK, so this is basically the same assertion as in ticket #337 on ppc32.
I am already investigating this. It is probably not a bug in the uspace heap allocator itself. One promising lead is that it has something to do with shared memory areas (due to IPC), but I haven't been able to verify this yet.
comment:6 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed the hang in changeset:mainline,974.
The failed assert is still there, but that is a different story.
screenshot