Opened 13 years ago

Closed 13 years ago

#345 closed defect (fixed)

sun4u: Boot failure on a real Sun Ultra 60

Reported by: Martin Decky Owned by: Jakub Jermář
Priority: major Milestone: 0.5.0
Component: helenos/kernel/sparc64 Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

HelenOS revision 992 (default configuration for Ultra) does not boot on a real world Sun Ultra 60. The boot process ends with "Booting kernel …", but the machine cannot be reverted back to OBP.

Change History (11)

comment:1 by Jakub Jermář, 13 years ago

This will require bisecting between 0.4.3 and mainline,958, where you first noticed the problem.

comment:2 by Martin Decky, 13 years ago

Last known working revision is mainline,900 (i.e. the release bump to 0.4.3), revision mainline,904 is already broken. Thus it boils down very probably to the adoption of GCC 4.6 and binutils 2.21 and the related modifications. I'll try to bisect the merge commit mainline,904 deeper into its components and also downgrade to GCC 4.5.x to investigate the root cause.

comment:3 by Martin Decky, 13 years ago

OK. Either I have just discovered a bug in GCC or a bug in my brain. After applying the following patch the system boots fine again on my Ultra 60. The type "size_t" is defined as "unsigned long int", but I don't see any reasons for an overflow here. I have also checked all places where memset() or memsetb() gets called, but everything looks fine.

The assembly generated for the "unsigned int" variant of i is certainly much simpler, but I just don't see the catch. Any clues please? Maybe the memory model (currently medlow)?

=== modified file 'kernel/generic/src/lib/memfnc.c'
--- kernel/generic/src/lib/memfnc.c     2011-03-29 21:38:40 +0000
+++ kernel/generic/src/lib/memfnc.c     2011-05-25 16:58:37 +0000
@@ -55,7 +55,7 @@
  */
 void *memset(void *dst, int val, size_t cnt)
 {
-       size_t i;
+       unsigned int i;
        uint8_t *ptr = (uint8_t *) dst;
        
        for (i = 0; i < cnt; i++)

comment:4 by Jakub Jermář, 13 years ago

That's strange.
Note that mainline does work on Qemu/sparc64/OpenBIOS even with size_t i.

comment:5 by Martin Decky, 13 years ago

NB: It's the other way around. It DOES NOT work with "size_t i", but it DOES work with "unsigned int i". Strange indeed.

comment:6 by Jakub Jermář, 13 years ago

I noticed the failing memset() uses floating point registers:

436dd8:       d1 1e 40 01     ldd  [ %i1 + %g1 ], %f8                              
401df8:       94 07 40 1d     add  %i5, %i5, %o2                                    |  436ddc:       84 00 a0 01     inc  %g2                                             
401dfc:       c6 58 a0 10     ldx  [ %g2 + 0x10 ], %g3                              |  436de0:       80 a0 c0 02     cmp  %g3, %g2                                        
401e00:       82 06 c0 1c     add  %i3, %i4, %g1                                    |  436de4:       d1 3e 00 01     std  %f8, [ %i0 + %g1 ]    

whereas the non-failing one does not. Just guessing, but cannot this be the problem wrt. to generating the fp_disabled exception in sensitive places?

Version 0, edited 13 years ago by Jakub Jermář (next)

comment:7 by Jakub Jermář, 13 years ago

NB: The kernel explicitly disables FPU when it starts:

        wrpr %g0, PSTATE_PRIV_BIT, %pstate      ! disable interrupts and disable
                                                ! 32-bit address masking

So the first memset() (the one called from as_init()?) has the potential to cause fp_disabled() during the kernel startup. fp_disable() will, under certain circumstances, lead to calling scheduler_fpu_lazy_request(), which touches CPU. However, cpu_init() is called after as_init() so this will like result in a panic.

We could either:

  • make sure memset() does not use FPU, or
  • enable FPU when the kernel starts

It is not clear, however, whether the latter would be sufficient.

comment:8 by Jakub Jermář, 13 years ago

Here is a patch which should prevent the compiler from autonomously using floats:

=== modified file 'kernel/arch/sparc64/Makefile.inc'
--- kernel/arch/sparc64/Makefile.inc	2010-06-28 22:35:53 +0000
+++ kernel/arch/sparc64/Makefile.inc	2011-05-25 19:45:21 +0000
@@ -30,7 +30,7 @@
 BFD_ARCH = sparc
 BFD = binary
 
-GCC_CFLAGS += -m64 -mcpu=ultrasparc -mcmodel=medlow
+GCC_CFLAGS += -m64 -mcpu=ultrasparc -mcmodel=medlow -mno-fpu
 SUNCC_CFLAGS += -m64 -xarch=sparc -xregs=appl,no%float
 
 LFLAGS += -no-check-sections

Would be worth trying out on the ultra without the unsigned int hack.

in reply to:  7 comment:9 by Jakub Jermář, 13 years ago

Replying to jermar:

We could either:

  • make sure memset() does not use FPU, or
  • enable FPU when the kernel starts

It is not clear, however, whether the latter would be sufficient.

And it is not clear at all, whether the latter would be safe as it may corrupt the FPU state belonging to a thread. Looks like only viable way is the former option.

comment:10 by Martin Decky, 13 years ago

Good call, seems like a probable cause. I have just glanced through the disassembly, but I have missed the FPU registers. I'll test it ASAP.

comment:11 by Martin Decky, 13 years ago

Resolution: fixed
Status: newclosed

Revision mainline,996 tested positively on Ultra 60.

Note: See TracTickets for help on using tickets.