Opened 11 years ago

Closed 9 years ago

#507 closed defect (fixed)

Kernel assertion fail at phone_deallocp() at generic/src/ipc/ipcrsc.c:223 phone->state == IPC_PHONE_CONNECTING

Reported by: Jan Vesely Owned by: Jakub Jermář
Priority: major Milestone: 0.7.0
Component: helenos/kernel/generic Version: mainline
Keywords: ipc Cc:
Blocker for: Depends on:
See also:

Description

bug in uhci driver caused an ipc storm that produced this:

failed assertion
phone_deallocp() at generic/src/ipc/ipcrsc.c:223
phone→state == IPC_PHONE_CONNECTING

THE=0xbe304000: pe=0 thr=0xbe1a3898 task=0xbe302000 cpu=0xbf283000 as=0x8009c924 magic=0xfacefeed
0xbe305e5c:stacktrace.o:stack_trace()+0x13
0xbe305e9c:panic.o:panic_common()+0x14c
0xbe305edc:ipcrsc.0:phone_connect()
0xbe305efc:conctmeto.o:answer_process()+0x25
0xbe305f5c:sysipc.o:sys_ipc_wait_fir_call()+0x77
0xbe305fac:syscall.o:syscall_handler()+0xb8
0xbe305fd0:asm.o:sysenter_handler()+4c

Change History (5)

comment:1 by Jakub Jermář, 11 years ago

The panicking task is waiting for an answer to the IPC_M_CONNECT_ME_TO call and when it gets it, it starts to process it using the conctmeto.c::answer_process() callback. The callback sees the answer has a non-zero retval so it assumes the phone allocated in conctmeto.c::request_preprocess() is still in the IPC_PHONE_CONNECTING state and attempts to deallocate it via phone_dealloc(), which hits the assertion. Note that conctmeto.c::answer_preprocess() should not connect the phone and thus modify its state on a non-zero retval. It would be instrumental to know what state the phone was actually in at the time of the crash. Without this knowledge we can only speculate:

  • The phone state might have been IPC_PHONE_CONNECTED, which would mean that the call was first affirmatively answered with EOK and the retval somehow changed between the conctmeto.c::answer_preprocess() and conctmeto.c:answer_process() callbacks. This also includes the possibility that the call retval might have been corrupted or abruptly changed as a matter of handling some corner cases.
  • The phone might have been in some other possible phone state different from IPC_PHONE_CONNECTING and IPC_PHONE_CONNECTED, which would suggest a logic error in handling of the phone state transitions during the handling of IPC_M_CONNECT_ME_TO.
  • The phone state might have been corrupted due to an unknown kernel memory corruption, which would explain this behaviour.

comment:2 by Jakub Jermář, 11 years ago

Status: newaccepted

comment:3 by Martin Decky, 9 years ago

Milestone: 0.6.00.7.0

comment:4 by Jakub Jermář, 9 years ago

Jan Mareš provided a reproducible test case to a problem which seems to be a duplicate of this one:

http://lists.modry.cz/private/helenos-devel/2015-June/007599.html

The problem seems to be that all phones of the panicking task are actually already connected so the IPC_M_CONNECT_ME_TO's request_preprocess() simply returns ELIMIT because it cannot find a free phone. This results in an ipc_backsend_err() handling, which automatically answers the request with the error code. So far so good. It is answer_process() which is not ready to handle this situation as it assumes that a phone _has_ been allocated (and that the answer comes from the actual callee and not the caller itself). answer_process() interprets call->priv as the phoneid, but call->priv is not initialized in this case.

comment:5 by Jakub Jermář, 9 years ago

Keywords: ipc added
Resolution: fixed
Status: acceptedclosed

Fixed in mainline,2336.

Note: See TracTickets for help on using tickets.