Opened 13 years ago
Closed 13 years ago
#345 closed defect (fixed)
sun4u: Boot failure on a real Sun Ultra 60
Reported by: | Martin Decky | Owned by: | Jakub Jermář |
---|---|---|---|
Priority: | major | Milestone: | 0.5.0 |
Component: | helenos/kernel/sparc64 | Version: | mainline |
Keywords: | Cc: | ||
Blocker for: | Depends on: | ||
See also: |
Description
HelenOS revision 992 (default configuration for Ultra) does not boot on a real world Sun Ultra 60. The boot process ends with "Booting kernel …", but the machine cannot be reverted back to OBP.
Change History (11)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
Last known working revision is mainline,900 (i.e. the release bump to 0.4.3), revision mainline,904 is already broken. Thus it boils down very probably to the adoption of GCC 4.6 and binutils 2.21 and the related modifications. I'll try to bisect the merge commit mainline,904 deeper into its components and also downgrade to GCC 4.5.x to investigate the root cause.
comment:3 by , 13 years ago
OK. Either I have just discovered a bug in GCC or a bug in my brain. After applying the following patch the system boots fine again on my Ultra 60. The type "size_t" is defined as "unsigned long int", but I don't see any reasons for an overflow here. I have also checked all places where memset() or memsetb() gets called, but everything looks fine.
The assembly generated for the "unsigned int" variant of i is certainly much simpler, but I just don't see the catch. Any clues please? Maybe the memory model (currently medlow)?
=== modified file 'kernel/generic/src/lib/memfnc.c' --- kernel/generic/src/lib/memfnc.c 2011-03-29 21:38:40 +0000 +++ kernel/generic/src/lib/memfnc.c 2011-05-25 16:58:37 +0000 @@ -55,7 +55,7 @@ */ void *memset(void *dst, int val, size_t cnt) { - size_t i; + unsigned int i; uint8_t *ptr = (uint8_t *) dst; for (i = 0; i < cnt; i++)
comment:4 by , 13 years ago
That's strange.
Note that mainline does work on Qemu/sparc64/OpenBIOS even with size_t i.
comment:5 by , 13 years ago
NB: It's the other way around. It DOES NOT work with "size_t i", but it DOES work with "unsigned int i". Strange indeed.
comment:6 by , 13 years ago
I noticed the failing memset() uses floating point registers:
436dd8: d1 1e 40 01 ldd [ %i1 + %g1 ], %f8 401df8: 94 07 40 1d add %i5, %i5, %o2 436ddc: 84 00 a0 01 inc %g2 401dfc: c6 58 a0 10 ldx [ %g2 + 0x10 ], %g3 436de0: 80 a0 c0 02 cmp %g3, %g2 401e00: 82 06 c0 1c add %i3, %i4, %g1 436de4: d1 3e 00 01 std %f8, [ %i0 + %g1 ]
whereas the non-failing one does not. Just guessing, but cannot this be the problem wrt. to generating the fp_disabled exception in sensitive places?
follow-up: 9 comment:7 by , 13 years ago
NB: The kernel explicitly disables FPU when it starts:
wrpr %g0, PSTATE_PRIV_BIT, %pstate ! disable interrupts and disable ! 32-bit address masking
So the first memset() (the one called from as_init()?) has the potential to cause fp_disabled() during the kernel startup. fp_disable() will, under certain circumstances, lead to calling scheduler_fpu_lazy_request(), which touches CPU. However, cpu_init() is called after as_init() so this will like result in a panic.
We could either:
- make sure memset() does not use FPU, or
- enable FPU when the kernel starts
It is not clear, however, whether the latter would be sufficient.
comment:8 by , 13 years ago
Here is a patch which should prevent the compiler from autonomously using floats:
=== modified file 'kernel/arch/sparc64/Makefile.inc' --- kernel/arch/sparc64/Makefile.inc 2010-06-28 22:35:53 +0000 +++ kernel/arch/sparc64/Makefile.inc 2011-05-25 19:45:21 +0000 @@ -30,7 +30,7 @@ BFD_ARCH = sparc BFD = binary -GCC_CFLAGS += -m64 -mcpu=ultrasparc -mcmodel=medlow +GCC_CFLAGS += -m64 -mcpu=ultrasparc -mcmodel=medlow -mno-fpu SUNCC_CFLAGS += -m64 -xarch=sparc -xregs=appl,no%float LFLAGS += -no-check-sections
Would be worth trying out on the ultra without the unsigned int hack.
comment:9 by , 13 years ago
Replying to jermar:
We could either:
- make sure memset() does not use FPU, or
- enable FPU when the kernel starts
It is not clear, however, whether the latter would be sufficient.
And it is not clear at all, whether the latter would be safe as it may corrupt the FPU state belonging to a thread. Looks like only viable way is the former option.
comment:10 by , 13 years ago
Good call, seems like a probable cause. I have just glanced through the disassembly, but I have missed the FPU registers. I'll test it ASAP.
comment:11 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Revision mainline,996 tested positively on Ultra 60.
This will require bisecting between 0.4.3 and mainline,958, where you first noticed the problem.