Opened 8 months ago
Last modified 6 months ago
#866 new defect
console input freezes on ia64/ski after GCC upgrade
Reported by: | Jiří Zárevúcky | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 0.14.2 |
Component: | helenos/kernel/ia64 | Version: | mainline |
Keywords: | Cc: | ||
Blocker for: | Depends on: | ||
See also: |
Description
It seems that after we upgraded GCC to version 13.2, console input in the IA-64 ski emulator always freezes a couple of seconds after boot. With old GCC 8.2.0, this doesn't happen as far as I can tell.
Change History (7)
comment:1 by , 6 months ago
Milestone: | 0.13.1 → 0.14.1 |
---|
comment:2 by , 6 months ago
I can confirm this. I booted up and ran 'ls' and it hang halfway through listing the directory.
However, you can still switch to kconsole and it works. You can still switch between virtual terminals with Fn (although none of them respond to keyboard).
Maybe some server crashed? (VFS?)
comment:3 by , 6 months ago
HelenOS release 0.12.1 (Cathode), revision a29af3775 Built on 2024-03-19 08:45:14 Running on ia64 (term/vc0) Copyright (c) 2001-2022 HelenOS project Welcome to HelenOS! http://www.helenos.org/ Type 'help' [Enter] to see a few survival tips. / # ls app <dir> boot <dir> cfg <dir> data <dir> drv <dir> gzx <dir> kconsole> tasks [id ] [name ] [ctn] [address ] [as ] 1 kernel 0 0xe0000000040ec000 0xe00000000406c000 2 init:ns 0 0xe0000000040ec488 0xe00000000406c0a8 4 init:locsrv 0 0xe0000000040ecd98 0xe00000000406c1f8 5 init:rd 0 0xe0000000040ed220 0xe00000000406c2a0 6 init:vfs 0 0xe0000000040ed6a8 0xe00000000406c348 7 init:logger 0 0xe0000000040edb30 0xe00000000406c3f0 8 init:ext4fs 0 0xe0000000040edfb8 0xe00000000406c498 9 /srv/fs/tmpfs 0 0xe0000000040ee440 0xe00000000406c540 10 /srv/fs/exfat 0 0xe0000000040ee8c8 0xe00000000406c5e8 11 /srv/fs/fat 0 0xe0000000040eed50 0xe00000000406c690 12 /srv/fs/cdfs 0 0xe0000000040ef1d8 0xe00000000406c738 13 /srv/fs/mfs 0 0xe0000000040ef660 0xe00000000406c7e0 14 /srv/klog 0 0xe0000000040efae8 0xe00000000406c888 15 /srv/fs/locfs 0 0xe000000007ec0000 0xe00000000406c930 16 /srv/taskmon 0 0xe000000007ec0488 0xe00000000406c9d8 17 /srv/devman 0 0xe000000007ec0910 0xe00000000406ca80 18 /drv/root/root 0 0xe000000007ec0d98 0xe00000000406cb28 19 /srv/hid/s3c24xx_ts 0 0xe000000007ec1220 0xe00000000406cbd0 20 /drv/virt/virt 0 0xe000000007ec16a8 0xe00000000406cc78 21 /drv/ski/ski 0 0xe000000007ec1b30 0xe00000000406cd20 22 /srv/bd/vbd 0 0xe000000007ec1fb8 0xe00000000406cdc8 23 /drv/kfb/kfb 0 0xe000000007ec2440 0xe00000000406ce70 24 /drv/ski-con/ski-con 0 0xe000000007ec28c8 0xe00000000406cf18 25 /srv/volsrv 0 0xe000000007ec2d50 0xe00000000406cfc0 26 /srv/net/loopip 0 0xe000000007ec31d8 0xe00000000406d068 27 /srv/net/ethip 0 0xe000000007ec3660 0xe00000000406d110 28 /srv/net/inetsrv 0 0xe000000007ec3ae8 0xe00000000406d1b8 29 /srv/net/tcp 0 0xe000000008ed0000 0xe00000000406d260 30 /srv/net/udp 0 0xe000000008ed0488 0xe00000000406d308 31 /srv/net/dnsrsrv 0 0xe000000008ed0910 0xe00000000406d3b0 32 /srv/net/dhcp 0 0xe000000008ed0d98 0xe00000000406d458 33 /srv/net/nconfsrv 0 0xe000000008ed1220 0xe00000000406d500 34 /srv/clipboard 0 0xe000000008ed16a8 0xe00000000406d5a8 35 /srv/hid/remcons 0 0xe000000008ed1b30 0xe00000000406d650 36 /srv/hid/input 0 0xe000000008ed1fb8 0xe00000000406d6f8 37 /srv/hid/output 0 0xe000000008ed2440 0xe00000000406d7a0 38 /srv/audio/hound 0 0xe000000008ed28c8 0xe00000000406d848 39 /srv/hid/console 0 0xe000000008ed2d50 0xe00000000406d8f0 40 /app/getterm 0 0xe000000008ed31d8 0xe00000000406d998 41 /app/getterm 0 0xe000000008ed3660 0xe00000000406da40 42 /app/bdsh 0 0xe000000008ed3ae8 0xe00000000406dae8 43 /app/getterm 0 0xe000000009ebc000 0xe00000000406db90 44 /app/bdsh 0 0xe000000009ebc488 0xe00000000406dc38 45 /app/getterm 0 0xe000000009ebc910 0xe00000000406dce0 46 /app/bdsh 0 0xe000000009ebcd98 0xe00000000406dd88 47 /app/getterm 0 0xe000000009ebd220 0xe00000000406de30 48 loader 0 0xe000000009ebd6a8 0xe00000000406ded8 49 /app/getterm 0 0xe000000009ebdb30 0xe00000000406df80 50 loader 0 0xe000000009ebdfb8 0xe00000000406e028 51 loader 0 0xe0000000040ec910 0xe00000000406c150 kconsole>
comment:4 by , 6 months ago
In the listing above vfs is still present.
However, if I try to minimize the list of tasks run from init, the behavior changes.
Starting less VTs leads to not even getting a command line.
If I remove most other things (networking, filesystems, etc.), I get different behavior when the problem occurs: typing a character (e.g. 'A') into an active VT prints a space. Typing Enter prints a number of spaces (tab?).
If I try to start a UI application, the system usually freezes before it starts (except Hello). But if I start it from init instead of bdsh, it starts and continues working (e.g. Task Bar, Calculator). Note: These don't use stdout/vfs for output.
However taskbar then fails to start an application (returns right back).
In these cases if I press F12 and print the list of tasks, vfs is not listed.
This suggests it crashed.
The problem is ski kernel console doesn't keep history so we can't see the crash.
Unless…
If I press F12 before the problem occurs, I get:
Task init:vfs (6) killed due to an exception at program counter 0x400000000001d240. ar.bsp=0xe00000000416c0b8 ar.bspstore=0x60000000002f4018 ar.rnat=0x0 ar.rsc=0xc ar.ifs=0x8000000000000288 ar.pfs=0xc000000000000288 cr.isr=0x400000000 cr.ipsr=0x1013080a6010 cr.iip=0x400000000001d240, #0 (<unknown>) cr.iipa=0x400000000001f2e0 (<unknown>) cr.ifa=0x400 (<unknown>) Kill message: Page fault: 0x0000000000000400.
comment:5 by , 6 months ago
0x400000000001d240 is inside fibril_mutex_unlock:
400000000001d200 <fibril_mutex_unlock>: 400000000001d200: 04 18 21 0a 80 05 [MLX] alloc r35=ar.pfs,8,5,0 400000000001d206: 00 00 00 00 60 c0 movl r38=0xe00000000001d1a8 400000000001d20c: 84 22 8c 6e 400000000001d210: 05 20 01 02 00 21 [MLX] mov r36=r1 400000000001d216: 00 00 00 00 00 a0 movl r37=0x10738;; 400000000001d21c: 84 23 38 60 400000000001d220: 08 30 05 4c 00 20 [MMI] add r38=r1,r38 400000000001d226: 50 0a 94 00 40 40 add r37=r1,r37 400000000001d22c: 04 00 c4 00 mov r34=b0 400000000001d230: 19 00 00 00 01 00 [MMB] nop.m 0x0 400000000001d236: 00 00 00 02 00 00 nop.m 0x0 400000000001d23c: 18 1f 00 50 br.call.sptk.many b0=400000000001f140 <__futex_lock>;; 400000000001d240: 11 08 01 40 18 10 [MIB] ld8 r33=[r32] 400000000001d246: 10 00 90 00 42 00 mov r1=r36 400000000001d24c: 08 e2 ff 58 br.call.sptk.many b0=400000000001b440 <fibril_get_id>;; 400000000001d250: 09 30 20 42 07 38 [MMI] cmp.eq p6,p7=r8,r33 400000000001d256: 00 00 00 02 00 20 nop.m 0x0 400000000001d25c: 00 20 01 84 mov r1=r36;; 400000000001d260: 04 00 00 00 01 00 [MLX] nop.m 0x0 400000000001d266: 00 00 00 00 e0 c3 (p07) movl r38=0xe00000000001d260 400000000001d26c: 04 26 90 6e 400000000001d270: e5 38 2d 01 01 24 [MLX] (p07) mov r39=203
comment:6 by , 6 months ago
I think it's crashing here:
static void _fibril_mutex_unlock_unsafe(fibril_mutex_t *fm) { assert(fm->oi.owned_by == (fibril_t *) fibril_get_id()); <--- NULL pointer dereference?? if (fm->counter++ < 0) { awaiter_t *wdp = list_pop(&fm->waiters, awaiter_t, link); assert(wdp); fibril_t *f = (fibril_t *) wdp->fid; fm->oi.owned_by = f; f->waits_for = NULL; fibril_notify(&wdp->event); } else { fm->oi.owned_by = NULL; } }
comment:7 by , 6 months ago
Milestone: | 0.14.1 → 0.14.2 |
---|
When I try to instrument the code to get more information, it fails in a different spot. Need a better way of debugging, e.g. at least dump the raw stack when it crashes.
With -O2 this problem goes away. I tried to determine which exact optimization is causing this, but failed. When I compile with -O3, but specifically disable all the extra optimizations listed in the man page as being turned on by -O3, the problem persists.
I will use -O2 in the build defaults as a workaround so that we can make the release. Then I will revert it to -O3 so that we don't forget about this - I would like to continue investigating.
Milestone renamed