Follow us on Google+ Follow us on Facebook Follow us on Twitter

Opened 8 years ago

Last modified 3 years ago

#4 reopened defect

HelenOS/sparc64 unstable with CONFIG_TSB

Reported by: jermar Owned by: jermar
Priority: major Milestone: 0.7.1
Component: helenos/kernel/sparc64 Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

I found out that when I double the size of the buffer allocated for the TSB, the problem disappears. However, the size used for TSB allocation seems right. Therefore, it seems like something is damaging the content of the TSB memory.

I still haven't seen this show elsewhere than on one of the Ultra 60's.

Disabling TSB during compile time is a workaround for this bug.

By further investigating the issue, I have come to the conslusion that the bug was introduced in revision 2161. It is more likely that an already existing bug was exposed by fixing another bug in 2161. 2161 fixes a bug which prevented the TSB from functioning at all. So it looks like a TSB issue.

I have never seen this with r2128.
The earliest revision I saw this bug on is r 2174.
I have not investigated the revisions in between yet.
The problem seems to be independent from whether the kernel was translated with gcc 4.1.1 or gcc 4.1.2.

I saw this only on one Ultra 60 when trying to boot revisions around 2233 from a CD-ROM.
What happened was one of the three scenarios:

  1. the kernel booted just fine, but the ns task got the data_access_error exception (as reported in klog) and died; several tasks died afterwards, most likely due to the fact that they could not connect to ns; the kconsole was responsive in this case and I could investigate the content of the klog
  1. the kernel booted just fine, but the ns task exitted and no exception was reported in klog; some other tasks died after ns exitted; the kconsole was responsive in this case and I could investigate the content of the klog
  1. the kernel booted but then it looked as hung - no console task UI and the kconsole was not responsive

Change History (12)

comment:1 Changed 8 years ago by jermar

  • Component set to kernel/sparc64

comment:2 Changed 8 years ago by jermar

  • Summary changed from Sudden death of userspace tasks to HelenOS/sparc64 unstable with CONFIG_TSB

The issue still exists with revision 4684, but I think it has slightly different symptoms considering the huge evolution step HelenOS made from 2233 to 4684.

comment:3 Changed 8 years ago by jermar

  • Milestone set to 0.5.0

comment:4 Changed 7 years ago by jermar

The respective Ultra 60 system ran fine (without any of the above symptoms) with the current version of HelenOS over the night, having the following load:

  • played tetris to around 4500 points
  • ran kernel and userspace tests
  • ran tester loop1 test
  • ran the factorial sysel example in an infinite loop

This morning, the system did not boot, either hanging, or killing the userspace tasks due to an data_access_error, or both. The data_access_error trap is a sign of a hardware problem (i.e. a machine check exception).

comment:5 Changed 6 years ago by jermar

  • Status changed from new to accepted

comment:6 Changed 6 years ago by jermar

  • Status changed from accepted to assigned

comment:7 Changed 6 years ago by jermar

  • Milestone changed from 0.5.0 to 0.5.1

comment:8 Changed 6 years ago by jermar

  • Resolution set to worksforme
  • Status changed from assigned to closed

Closing as not reproducible. This ticket has been reproducible only on one Ultra 60 which will shortly become unavailable to me. If the issue reproduces on some other machine, please file a new ticket with up to date data.

comment:9 Changed 6 years ago by jermar

  • Resolution worksforme deleted
  • Status changed from closed to reopened

Reopening as my new Ultra 60, 2x CPU, 2GiB RAM exhibits the same problem (mainline,1018).

comment:10 Changed 6 years ago by jermar

The two cpus identify as:

cpu0: manuf=UltraSPARC, impl=UltraSPARC II, mask=160 (450 MHz)
cpu1: manuf=UltraSPARC, impl=UltraSPARC II, mask=160 (450 MHz)

comment:11 Changed 5 years ago by jermar

  • Milestone changed from 0.5.0 to 0.5.1

comment:12 Changed 3 years ago by jermar

  • Milestone changed from 0.6.0 to 0.7.1
Note: See TracTickets for help on using tickets.