Opened 12 years ago

Closed 12 years ago

#459 closed defect (fixed)

Failed assertion on cpu.timeoutlock spinlock

Reported by: Vojtech Horky Owned by: Jakub Jermář
Priority: major Milestone: 0.5.0
Component: helenos/kernel/amd64 Version: mainline
Keywords: spinlock Cc:
Blocker for: Depends on:
See also:

Description

When trying to reproduce bugs #396 and #458, following assertion was hit (see attached picture for full log):

######> Kernel panic on cpu3 due to a failed assertion: <######
irq_spinlock_trylock() at generic/src/synch/spinlock.c:262:
!lock->guard, cpu.timeoutlock

THE=...
...
...: generic/src/debug/panic.o:panic_common()
...: generic/src/synch/spinlock.o:irq_spinlock_trylock()
...: generic/src/time/timeout.o:timeout_unregister()
...: generic/src/synch/waitq.o:waitq_unsleep()
...: generic/src/ipc/sysipc.o:sys_ipc_poke()
...: generic/src/syscall/syscall.o:syscall_handler()
...: arch/amd64/src/asm.o:syscall_entry()
...
cpu2: looping on spinlock ...:timeout_t_lock, \
    caller=... (generic/src/time/timeout.o:timeout_register)

Steps to reproduce

  1. Compile default ia32 or amd64 from current mainline (1486) or from lp:~jakub/helenos/mm (985).
  2. Run in QEMU:
    qemu -cdrom image.iso -smp 4 -m 1024 -net user \
      -device e1000,vlan=0 -redir tcp:2223::2223
    
  3. Configure networking in HelenOS
    inetcfg create 10.0.2.15/24 net/eth1 addr
    
  4. Connect to HelenOS from the host via
    telnet localhost 2223
    
  5. Start typing something in the telnet session, such as tester malloc1
  6. HelenOS will panic typically when typing the word malloc1 or soon after the tester is started

Notes
This problem might be related to the mentioned bugs but as I can see no clear connection, I opened a new ticket.

I have tried the described steps several times and hit the assertion always. The problem appears regardless of the amount of the memory the guest machine has.

It looks that if the kernel thread balancing thread kcpulb is disabled (e.g. by adding return after thread_detach(THREAD) in scheduler.c), the problem is reproducible on 2 CPUs only. Otherwise, 4-SMP machine is needed.

Attachments (1)

spinlock_guard_failed_assert.png (5.5 KB ) - added by Vojtech Horky 12 years ago.
Kernel panic on cpu.timeoutlock spinlock

Download all attachments as: .zip

Change History (4)

by Vojtech Horky, 12 years ago

Kernel panic on cpu.timeoutlock spinlock

comment:1 by Jakub Jermář, 12 years ago

I think irq_spinlock_trylock() should not draw any conclusions from lock->guard if it didn't manage to grab it. In that case, the IRQ spinlock may be held by someone who took it with irq_dis (second argument to irq_spinlock_lock()) true.

In a similar vein, it doesn't look appropriate that irq_spinlock_trylock() tests whether interrupts are disabled.

comment:2 by Jakub Jermář, 12 years ago

Milestone: 0.5.10.5.0

comment:3 by Jakub Jermář, 12 years ago

Resolution: fixed
Status: newclosed

Fix merged in mainline,1489.

Note: See TracTickets for help on using tickets.