Opened 13 years ago

Closed 12 years ago

#362 closed defect (notadefect)

Line debugging information breaks arm32

Reported by: Jiri Svoboda Owned by:
Priority: major Milestone: 0.5.0
Component: helenos/unspecified Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

If you take profile arm32/GXemul plus enable CONFIG_LINE_DEBUG, the system stops responding around the time the command line comes up. Sometimes it reacts to the first few keys. The cursor stops blinking. There should be plenty of memory, I run GXemul with:

$ gxemul -M 256 $@ -E testarm -X image.boot

Attachments (1)

gxemul.patch (506 bytes ) - added by Jakub Jermář 12 years ago.

Download all attachments as: .zip

Change History (4)

comment:1 by Jiri Svoboda, 13 years ago

I get the same result with old GXemul 0.4.7.1 and with GXemul 0.6.0.

comment:2 by Jakub Jermář, 12 years ago

I think you may be hitting the following GXemul issue:

On 05/03/2010 10:50 PM, Jakub Jermar wrote:

I am now using GXemul 0.4.7.2 and in combination with the head revision of HelenOS,
I am observing a weird thing on the testarm target.

The system boots fine, but after about 20 seconds, it does not receive any more
interrupts. If I break into the integrated debugger and try to do a few 'step' commands
or set a breakpoint and then 'continue', the system will receive few more interrupts
and will continue to run normally for some time until the same problem happens again.

I used the 'trace' command to see what is going on at these hung periods. The kernel is
looping in the scheduler in a very tight loop which disables and then enables again
interrupts, so that interrupts are enabled only for a couple of instructions (it is
basically idling).

I managed to reproduce the hang while I was having a breakpoint set at the IRQ exception
vector and verified that the processor was not getting any interrupts at that time
(neither from the timer nor from the keyboard).

I also tried to narrow down the issue on the GXemul side, so I put some debugging prints
into GXemul, but the code in timer.c seemed to be ticking ok.

Do you have any idea of what could have gone wrong? What is notable is that using the
integrated debugger can fix the problem for a while.


On 05/06/2010 08:02 AM, Anders Gavare wrote:

Mån 2010-05-03 klockan 22:50 +0200 skrev Jakub Jermar:
..

The system boots fine, but after about 20 seconds, it does not receive any more
interrupts. If I break into the integrated debugger and try to do a few 'step' commands
or set a breakpoint and then 'continue', the system will receive few more interrupts
and will continue to run normally for some time until the same problem happens again.

I used the 'trace' command to see what is going on at these hung periods. The kernel is
looping in the scheduler in a very tight loop which disables and then enables again
interrupts, so that interrupts are enabled only for a couple of instructions (it is
basically idling).

..

I also tried to narrow down the issue on the GXemul side, so I put some debugging prints
into GXemul, but the code in timer.c seemed to be ticking ok.

Do you have any idea of what could have gone wrong?

Yes. If you look for

int DYNTRANS_RUN_INSTR_DEF(struct cpu *cpu)

in cpu_dyntrans.c, you'll see that interrupt delivery is an ugly hack.
Instead of checking for interrupts for _every_ instructions, like a
cycle-accurate instruction simulator would do, GXemul (and I guess other
emulators as well) only look "now and then". Unfortunately, this
mechanism may be unreliable in some situations.

What is notable is that using the
integrated debugger can fix the problem for a while.

That's because breaking into the debugger and then continuing usually
resets dyntrans counters. So you "align" the interrupt check
differently.

One way to deal with this is to add checks to the CPU specific interrupt
enable/disable mechanism (different for each arch), and "queue up"
interrupts. (That way, if interrupts are quickly enabled and then
disabled, they won't be missed, like you are seeing now.) Then, instead
of checking whether interrupts are enabled only at dyntrans entry, one
could check for such queued up interrupts.

However, for the future: I've had problems with this in the past as
well, and since the 0.6.x framework has no interrupt subsystem yet, this
is probably one thing that should be dealt with better in the design of
the 0.6.x stuff. In general, 0.6.x will be more aimed towards
cycle-accuracy/replayability, and interrupt accuracy comes into that as
well. But it will take time before that is working.


On 05/06/2010 09:55 PM, Jakub Jermar wrote:

On 05/06/2010 08:02 AM, Anders Gavare wrote:

Mån 2010-05-03 klockan 22:50 +0200 skrev Jakub Jermar:
Yes. If you look for

int DYNTRANS_RUN_INSTR_DEF(struct cpu *cpu)

in cpu_dyntrans.c, you'll see that interrupt delivery is an ugly hack.
Instead of checking for interrupts for _every_ instructions, like a
cycle-accurate instruction simulator would do, GXemul (and I guess other
emulators as well) only look "now and then". Unfortunately, this
mechanism may be unreliable in some situations.

Oh, I see. The alignment between the HelenOS scheduling loop and GXemul was
such, that GXemul always evaluated:

if (cpu→cd.arm.irq_asserted && !(cpu→cd.arm.cpsr & ARM_FLAG_I))

arm_exception(cpu, ARM_EXCEPTION_IRQ);

while interrupts were disabled.

I tried to bring a bit of randomness to get GXemul of this stereotype by
adding one more condition for single-stepping:

if (single_step
cpu→machine→instruction_trace
cpu→machine→register_dump (random() % 15) == 1) {

And it seems to have helped.

What is notable is that using the
integrated debugger can fix the problem for a while.

That's because breaking into the debugger and then continuing usually
resets dyntrans counters. So you "align" the interrupt check
differently.

One way to deal with this is to add checks to the CPU specific interrupt
enable/disable mechanism (different for each arch), and "queue up"
interrupts. (That way, if interrupts are quickly enabled and then
disabled, they won't be missed, like you are seeing now.) Then, instead
of checking whether interrupts are enabled only at dyntrans entry, one
could check for such queued up interrupts.

However, for the future: I've had problems with this in the past as
well, and since the 0.6.x framework has no interrupt subsystem yet, this
is probably one thing that should be dealt with better in the design of
the 0.6.x stuff. In general, 0.6.x will be more aimed towards
cycle-accuracy/replayability, and interrupt accuracy comes into that as
well. But it will take time before that is working.

Well, from our point of view it would be sufficient to have any working
solution to this problem. The queued interrupts you describe above look
like a good idea to me.


I am attaching a simple patch that fixed this for me (even though I am not able to reproduce the issue now without the patch.)

by Jakub Jermář, 12 years ago

Attachment: gxemul.patch added

comment:3 by Jakub Jermář, 12 years ago

Resolution: notadefect
Status: newclosed

Closing, feel free to reopen if this is not the assumed GXemul issue.

Note: See TracTickets for help on using tickets.