Opened 13 years ago
Closed 10 years ago
#507 closed defect (fixed)
Kernel assertion fail at phone_deallocp() at generic/src/ipc/ipcrsc.c:223 phone->state == IPC_PHONE_CONNECTING
| Reported by: | Jan Vesely | Owned by: | Jakub Jermář | 
|---|---|---|---|
| Priority: | major | Milestone: | 0.7.0 | 
| Component: | helenos/kernel/generic | Version: | mainline | 
| Keywords: | ipc | Cc: | |
| Blocker for: | Depends on: | ||
| See also: | 
Description
bug in uhci driver caused an ipc storm that produced this:
failed assertion
phone_deallocp() at generic/src/ipc/ipcrsc.c:223
phone→state == IPC_PHONE_CONNECTING
THE=0xbe304000: pe=0 thr=0xbe1a3898 task=0xbe302000 cpu=0xbf283000 as=0x8009c924 magic=0xfacefeed
0xbe305e5c:stacktrace.o:stack_trace()+0x13
0xbe305e9c:panic.o:panic_common()+0x14c
0xbe305edc:ipcrsc.0:phone_connect()
0xbe305efc:conctmeto.o:answer_process()+0x25
0xbe305f5c:sysipc.o:sys_ipc_wait_fir_call()+0x77
0xbe305fac:syscall.o:syscall_handler()+0xb8
0xbe305fd0:asm.o:sysenter_handler()+4c
Change History (5)
comment:1 by , 13 years ago
comment:2 by , 12 years ago
| Status: | new → accepted | 
|---|
comment:3 by , 11 years ago
| Milestone: | 0.6.0 → 0.7.0 | 
|---|
comment:4 by , 10 years ago
Jan Mareš provided a reproducible test case to a problem which seems to be a duplicate of this one:
http://lists.modry.cz/private/helenos-devel/2015-June/007599.html
The problem seems to be that all phones of the panicking task are actually already connected so the IPC_M_CONNECT_ME_TO's request_preprocess() simply returns ELIMIT because it cannot find a free phone. This results in an ipc_backsend_err() handling, which automatically answers the request with the error code. So far so good. It is answer_process() which is not ready to handle this situation as it assumes that a phone _has_ been allocated (and that the answer comes from the actual callee and not the caller itself). answer_process() interprets call->priv as the phoneid, but call->priv is not initialized in this case.
comment:5 by , 10 years ago
| Keywords: | ipc added | 
|---|---|
| Resolution: | → fixed | 
| Status: | accepted → closed | 
Fixed in mainline,2336.


The panicking task is waiting for an answer to the IPC_M_CONNECT_ME_TO call and when it gets it, it starts to process it using the conctmeto.c::answer_process() callback. The callback sees the answer has a non-zero retval so it assumes the phone allocated in conctmeto.c::request_preprocess() is still in the IPC_PHONE_CONNECTING state and attempts to deallocate it via phone_dealloc(), which hits the assertion. Note that conctmeto.c::answer_preprocess() should not connect the phone and thus modify its state on a non-zero retval. It would be instrumental to know what state the phone was actually in at the time of the crash. Without this knowledge we can only speculate: