Opened 5 years ago

Closed 17 months ago

#752 closed defect (fixed)

hound crashes on a dual-CPU system

Reported by: Jakub Jermář Owned by:
Priority: major Milestone: 0.12.1
Component: helenos/srv/hound Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

When booted with multiple CPUs, the following hound crash is reproducible after playing demo.wav using wavplay:

[hdaudio] note: hda_stop_playback()
[hound] warn: Connection failed to provide enough data 0/4096
[hdaudio] note: hda_stream_stop()
[hdaudio n]ote: hda_stream_rese()
Task /srv/audio/hound (t53) killed due to an exception at program counter 0x00000000004020bc.
cs =0x0000000000000023	rip=0x00000000004020bc	rfl=0x0000000000210246	err=0x0000000000000004
ss =0x000000000000001b
rax=0x0000000000000400	rbx=0xfffffffffffffff0	rcx=0x0000000000000004	rdx=0x0000000000000003
rsi=0x000000000095a000	rdi=0xfffffffffffffff0	rbp=0x0000000000a60e40	rsp=0x0000000000a60e10
r8 =0x0000000000000004	r9 =0x0000000000000000	r10=0x0000000000000000	r11=0x0000000000200216
r12=0x0000000000001000	r13=0x0000000000000020	r14=0x000000000095a000	r15=0x0000000000000000
0x0000000000a60e40: 0x00000000004020bc()
0x0000000000a60e90: 0x0000000000401b0c()
0x0000000000a60f40: 0x000000000040103d()
0x0000000000a60fd0: 0x0000000000413a2c()
0x0000000000a60ff0: 0x0000000000411399()
Kill message: Page fault: 0x0000000000000038.
[/srv/taskmon(16)] taskmon: Task 53 fault in thread 0xffffffff8458e3f8.
[/srv/taskmon(16)] taskmon: Executing /app/taskdump -t 53
[/app/taskdump(68)] Task Dump Utility
[/app/taskdump(68)] Dumping task '/srv/audio/hound' (task ID 53).
[/app/taskdump(68)] Loaded symbol table from /srv/audio/hound
[/app/taskdump(68)] Threads:
[/app/taskdump(68)]  [1] hash: 0xffffffff8458e3f8
[/app/taskdump(68)] Thread 0xffffffff8458e3f8: PC = 0x00000000004020bc (connection_add_source_data+76). FP = 0x0000000000a60e40
[/app/taskdump(68)]   0x0000000000a60e40: 0x00000000004020bc (connection_add_source_data+76)
[/app/taskdump(68)]   0x0000000000a60e90: 0x0000000000401b0c (audio_sink_mix_inputs+124)
[/app/taskdump(68)]   0x0000000000a60f40: 0x000000000040103d (device_event_callback+141)
[/app/taskdump(68)]   0x0000000000a60fd0: 0x0000000000413a2c (connection_fibril+124)
[/app/taskdump(68)]   0x0000000000a60ff0: 0x0000000000411399 (_fibril_main+41)

Attachments (1)

hound.log (30.7 KB ) - added by Jakub Jermář 5 years ago.

Download all attachments as: .zip

Change History (9)

comment:1 by Jakub Jermář, 5 years ago

I am going to attach a full console log from a system built with -O0 containing also the other fibril stacks.

by Jakub Jermář, 5 years ago

Attachment: hound.log added

comment:2 by Jakub Jermář, 5 years ago

It seems the following two stacks touch the same data structures, the first one without taking any locks:

The fibril that crashed:

audio_pipe_frames+43
connection_add_source_data+116
audio_sink_mix_inputs+163
device_event_callback+225
connection_fibril+288
_fibril_main+59

The fibril which is waiting for IPC:

context_swap+28
_fibril_switch_to+303
fibril_wait_timeout+553
fibril_wait_for+64
async_wait_for+62
async_req_fast+144
audio_pcm_stop_playback+81
device_sink_connection_callback+637
connection_destroy+170
hound_remove_source_internal+185
hound_remove_ctx+150
iface_rem_context+99
hound_connection_handler+493
connection_fibril+288
_fibril_main+59

comment:3 by Jakub Jermář, 5 years ago

Component: helenos/unspecifiedhelenos/srv/hound

comment:4 by Jakub Jermář, 5 years ago

I can reproduce this reliably on a default build when I add an artificial fibril_usleep() like this:

--- a/uspace/srv/audio/hound/audio_sink.c
+++ b/uspace/srv/audio/hound/audio_sink.c
@@ -139,6 +139,7 @@ void audio_sink_mix_inputs(audio_sink_t *sink, void *dest, size_t size)
 
        pcm_format_silence(dest, size, &sink->format);
        list_foreach(sink->connections, sink_link, connection_t, conn) {
+               fibril_usleep(1000);
                const errno_t ret = connection_add_source_data(
                    conn, dest, size, sink->format);
                if (ret != EOK) {

followed by running:

/ # wavplay demo.wav

Looks like the list_foreach() above should apparently take the hound->list_guard mutex.

comment:5 by Jakub Jermář, 5 years ago

Milestone: 0.8.0

comment:6 by Jiri Svoboda, 4 years ago

Not 100% sure if it's the same thing, but you can reproduce this when GZX exits. Also I have a very old branch on my laptop with an audio player (aplayer) that is having the same problem. With aplayer I worked around it by adding a delay before destroying the hound context. I believe the problem happens if you destroy the context before playback finishes.

comment:7 by Jakub Jermář, 18 months ago

Could this have been fixed in:

commit 2fbd49c61404ed7f2e7f36f65d920ef81281a58e (HEAD -> master, origin/master)
Author: Jiri Svoboda <jiri@wiwaxia>
Date:   Wed Sep 21 20:01:20 2022 +0200

    Audio synk needs locking
    
    Without locking the list of sink connections, we are exposed
    to a race between removing a connection at the end of playback
    (typically while destroying a hound context) and audio device
    event PCM_EVENT_FRAMES_PLAYED which causes audio mixing to occur
    via audio_sink_mix_inputs(), causing hound to crash often at
    the end of playback.

?

comment:8 by Jiri Svoboda, 17 months ago

Milestone: 0.12.1
Resolution: fixed
Status: newclosed

You are correct.

Fixed in changeset 2fbd49c61404ed7f2e7f36f65d920ef81281a58e.

Note: See TracTickets for help on using tickets.