Opened 11 years ago

Closed 9 years ago

#509 closed defect (duplicate)

Applications crash in malloc() in the recent gta02 builds

Reported by: Jakub Jermář Owned by: Jiri Svoboda
Priority: major Milestone: 0.7.0
Component: helenos/kernel/arm32 Version: mainline
Keywords: gta02 Cc:
Blocker for: Depends on:
See also: #638

Description

As of mainline,1741, but also prior to this revision, applications crash inside malloc() when running on gta02 or similar boards.

Example of these crashes include the following call paths:

...
malloc()
async_send_fast()
log_msg()
...

or:

area_check()
malloc_internal()
malloc()
fat_idx_get_by_pos()
...

The crash happens on address 0x10, which suggest a NULL pointer passed to area_check().

Given that other architectures and other arm machines don't have these problems, this issue looks gta02 specific.

Attachments (5)

SAM_1336.JPG (1.1 MB ) - added by Jakub Jermář 11 years ago.
Screenshot depicting one of these crashes, mainline,1741
fat.disasm (2.1 MB ) - added by Jakub Jermář 11 years ago.
fat.disasm, mainline,1741
klog.bz2 (90.8 KB ) - added by Jakub Jermář 10 years ago.
klog binary as of mainline,2094
root.bz2 (112.9 KB ) - added by Jakub Jermář 10 years ago.
root binary as of mainline,2094
kernel.raw.bz2 (160.4 KB ) - added by Jakub Jermář 10 years ago.
kernel image as of mainline,2094

Change History (11)

by Jakub Jermář, 11 years ago

Attachment: SAM_1336.JPG added

Screenshot depicting one of these crashes, mainline,1741

by Jakub Jermář, 11 years ago

Attachment: fat.disasm added

fat.disasm, mainline,1741

comment:1 by Jakub Jermář, 11 years ago

I thought I may put some perspective acquired by a little bit of bisecting on the behaviour of the default mainline. This perspective may or may not be relevant to this specific ticket:

1711 crashes heap [#509]     <= the symptom changes [large stack support]
1710 kernel panic (bad trap)
1708 kernel panic (bad trap)
1705 kernel panic (bad trap)
1699 kernel panic (bad trap)
1692 kernel panic (bad trap)
1689 kernel panic (bad trap)
1688 kernel panic (bad trap) <= first bad revision [uspace hash table]
1687 good
1685 good
1670 good, panics upon touching the touchscreen [first noticed on an earlier revision]
1641 decoder panic, reached compositor [the decoder bug fixed fixed in later revisions]

So from the above it follows that #509 started to show with mainline,1711. Tested revisions before that, until mainline,1688 were consistently panicking in ipc_call_free() on a bad kernel trap. Revisions before mainline,1688 were known to be sometimes panicking upon a touchscreen event.

Unfortunately, this looks as if there has been a latent bug in the GTA02 support (other arm32 machines do not appear to be susceptible to any of this) which was only exploited by the above mentioned changes, namely mainline,1688, and, maybe, mainline,1711.

We should probably focus on the areas that are GTA02 specific to find the root cause of these issues.

comment:2 by Jakub Jermář, 10 years ago

As of mainline,2085, only barebone build boots on my gta02, exhibiting the above symptoms.

comment:3 by Jakub Jermář, 10 years ago

This is the boot process as of mainline,2094 (barebone), captured using the debug board:

U-Boot 1.3.2-moko12 (May  9 2008 - 10:28:48)

I2C:   ready
DRAM:  128 MB
Flash:  2 MB
NAND:  256 MiB
Glamo core device ID: 0x3650, Revision 0x0002
USB:   S3C2410 USB Deviced
mtdparts variable not set, see 'help mtdparts'
mtdparts variable not set, see 'help mtdparts'
mtdparts variable not set, see 'help mtdparts'
mtdparts variable not set, see 'help mtdparts'
mtdparts variable not set, see 'help mtdparts'
mtdparts variable not set, see 'help mtdparts'
HelenOS bootloader, release 0.5.0 (Fajtl), revision 2094M (m.lombardi85@gmail.com-20140401090159-xc3ilz3z42u901lq)
Built on 2014-04-16 00:13:48 for arm32
Copyright (c) 2001-2014 HelenOS project
Boot data: 0x30010000 -> 0x30b1b6be

Memory statistics
 0x30015000|0x30015000: bootstrap stack
 0x30010000|0x30010000: bootstrap page table
 0x30015720|0x30015720: boot info structure
 0xb0a08000|0x30a08000: kernel entry point
 0x30015c24|0x30015c24: kernel image (527624/149104 bytes)
 0x3003a294|0x3003a294: ns image (219641/94154 bytes)
 0x3005125e|0x3005125e: loader image (217799/93713 bytes)
 0x3006806f|0x3006806f: init image (219682/94413 bytes)
 0x3007f13c|0x3007f13c: locsrv image (226939/98110 bytes)
 0x3009707a|0x3009707a: rd image (217268/93290 bytes)
 0x300adce4|0x300adce4: vfs image (234595/101552 bytes)
 0x300c6994|0x300c6994: logger image (223590/96005 bytes)
 0x300de099|0x300de099: ext4fs image (292525/125222 bytes)
 0x300fc9bf|0x300fc9bf: initrd image (29360128/10611967 bytes)

Inflating components ... initrd ext4fs logger vfs rd locsrv init loader ns kernel .
Booting the kernel...
SPARTAN kernel, release 0.5.0 (Fajtl), revision 2094M (m.lombardi85@gmail.com-20140401090159-xc3ilz3z42u901lq)
Built on 2014-04-16 00:13:48 for arm32
Copyright (c) 2001-2014 HelenOS project
Detected 1 CPU(s), 131040 KiB free memory
Kernel console ready (press any key to activate)
Program loader at 0xf0200000
RAM disk at 0x30c52000 (size 29360128 bytes)
ns: HelenOS IPC Naming Service
ns: Accepting connections
init: HelenOS init
loc: HelenOS Location Service
rd: HelenOS RAM disk server
vfs: HelenOS VFS server
logger: HelenOS Logging Service
ext4fs: HelenOS ext4 file system server
loc: Accepting connections
logger: Accepting connections
rd: Found RAM disk at 0x30c52000, 29360128 bytes
rd: Accepting connections
vfs: Accepting connections
ext4fs: Accepting connections
init: Root filesystem mounted on / (ext4fs at bd/initrd)
init: Unable to stat /srv/tmpfs
init: Starting /srv/klog
init: Starting /srv/locfs
[kernel/other] note: Program loader at 0xf0200000
Task klog (9) killed due to an exception at program counter 0x0000acec.
r0 =0x00000000  r1 =0x002360a0  r2 =0x00000000  r3 =0x00000058
r4 =0x002360a0  r5 =0x00000010  r6 =0x00000000  r7 =0x0002e000
r8 =0x00000040  r9 =0x0013204c  r10=0x00000000  fp =0x00233ecc
r12=0x00233ed0  sp =0x00233e90  lr =0x0000b198  spsr=0x20000050
0x00233ecc: 0x0000acec()
0x00233efc: 0x0000b728()
0x00233f14: 0x0000bb44()
0x00233f7c: 0x000105dc()
0x00233fbc: 0x00006b48()
0x00233fdc: 0x000010ec()
0x00233ff4: 0x00001b6c()
Kill message: Page fault: 0x00000010.
locfs: HelenOS Device Filesystem
locfs: Accepting connections
init: Unable to stat /srv/taskmon
init: Location service filesystem mounted on /loc (locfs)
init: Temporary filesystem unknown type (tmpfs)
init: Starting /srv/devman
devman: HelenOS Device Manager
devman: Accepting connections.
root: HelenOS root device driver
init: Unable to stat /srv/apic
init: Unable to stat /srv/i8259
[devman] note: The `root' driver was successfully registered as running.
init: Unable to stat /srv/obio
init: Unable to stat /srv/cuda_adb
init: Unable to stat /srv/s3c24xx_uart
Task root (12) killed due to an exception at program counter 0x0001d874.
r0 =0x00000000  r1 =0x002440a0  r2 =0x00000000  r3 =0x00000058
r4 =0x002440a0  r5 =0x00000010  r6 =0x00000000  r7 =0x0003c000
r8 =0x00000040  r9 =0x0003cb3c  r10=0x00000000  fp =0x00241dec
r12=0x00241df0  sp =0x00241db0  lr =0x0001dd20  spsr=0x20000050
0x00241dec: 0x0001d874()
0x00241e1c: 0x0001e2b0()
0x00241e34: 0x0001e6cc()
0x00241e9c: 0x00023568()
0x00241edc: 0x00019808()
0x00241ef8: 0x00002dd4()
0x00241f34: 0x00001144()
0x00241fa4: 0x00001ce0()
0x00241fdc: 0x0001fc30()
0x00241ff4: 0x000145d0()
Kill message: Page fault: 0x00000010.
init: Starting /srv/s3c24xx_ts
s3c24xx_ts: S3C24xx touchscreen driver
s3c24xx_ts: device at physical address 0x58000000, inr 31.
s3c24xx_ts: Registered device hid/mouse.
s3c24xx_ts: Accepting connections
init: Unable to stat /srv/loopip
init: Unable to stat /srv/ethip
init: Unable to stat /srv/inetsrv
init: Unable to stat /srv/tcp
init: Unable to stat /srv/udp
init: Unable to stat /srv/dnsrsrv
init: Unable to stat /srv/dhcp
init: Unable to stat /srv/nconfsrv
init: Unable to stat /srv/clipboard
init: Unable to stat /srv/remcons
init: Starting /srv/input
input: HelenOS input service
input: Could not find any suitable input device

######> Kernel panic on cpu0 due to a failed assertion: <######
waitq_sleep_timeout() at generic/src/synch/waitq.c:264:
(!PREEMPTION_DISABLED) || (PARAM_NON_BLOCKING(flags, usec))

THE=0xb0586000: pd=2 thread=0xb0325a00 task=0xb0584000 cpu=0xb02eb400 as=0xb0009294 magic=0xfacefeed
thread="uinit"
task="input"
0xb0587c14: generic/src/debug/stacktrace.o:stack_trace()+0x0000001c
0xb0587c44: generic/src/debug/panic.o:panic_common()+0x000001ac
0xb0587c84: generic/src/synch/waitq.o:waitq_sleep_timeout()+0x00000154
0xb0587c94: generic/src/synch/semaphore.o:_semaphore_down_timeout()+0x00000010
0xb0587cdc: generic/src/synch/mutex.o:_mutex_lock_timeout()+0x0000003c
0xb0587d1c: generic/src/mm/as.o:as_page_fault()+0x00000068
0xb0587d4c: arch/arm32/src/mm/page_fault.o:data_abort()+0x00000210
0xb0587d84: generic/src/interrupt/interrupt.o:exc_dispatch()+0x00000104
0xb0587d9c: arch/arm32/src/ras.o:ras_check()+0x00000030
0xb0587e3c: arch/arm32/src/exc_handler.o:data_abort_exception_entry()+0x000000b4
0xb0587e84: generic/src/mm/slab.o:_slab_free()+0x000000f8
0xb0587e9c: generic/src/ipc/ipc.o:ipc_call_free()+0x0000003c
0xb0587ef4: generic/src/ipc/sysipc.o:sys_ipc_wait_for_call()+0x000001b0
0xb0587f3c: generic/src/syscall/syscall.o:syscall_handler()+0x000000cc
0xb0587f64: arch/arm32/src/exception.o:swi_exception()+0x00000034
0xb0587f9c: generic/src/interrupt/interrupt.o:exc_dispatch()+0x00000104
0xb0587fb4: arch/arm32/src/ras.o:ras_check()+0x00000030
cpu0: halted

We can see that klog and root tasks crashed (binaries to be attached) and the kernel crashed. Non-barebone builds end with:

Inflating components ... initrd 
initrd: Inflating error -14

by Jakub Jermář, 10 years ago

Attachment: klog.bz2 added

klog binary as of mainline,2094

by Jakub Jermář, 10 years ago

Attachment: root.bz2 added

root binary as of mainline,2094

by Jakub Jermář, 10 years ago

Attachment: kernel.raw.bz2 added

kernel image as of mainline,2094

comment:4 by Martin Decky, 9 years ago

Milestone: 0.6.00.7.0

comment:5 by Jakub Jermář, 9 years ago

Running mtest from the u-boot prompt report a memory error for the following addresses:

0x33ED8198
0x33ED819C
0x33ED81A0
0x33ED81A4
0x33ED81A8

This is approximately at 62.8 MB of physical memory. Going to test whether disabling this page will rid us of at least some of these issues.

comment:6 by Jakub Jermář, 9 years ago

Resolution: duplicate
See also: #638
Status: newclosed

The crashes go away when #638 is worked around / fixed. Makes sense: accidentally letting old mappings of deallocated pages linger in the TLB is a recipe for a disaster. Closing as duplicate of #638.

Note: See TracTickets for help on using tickets.