Opened 11 years ago
Closed 10 years ago
#606 closed defect (fixed)
VFS sometimes crashes in fibril_switch() on sun4v
| Reported by: | Jakub Jermář | Owned by: | Jakub Jermář |
|---|---|---|---|
| Priority: | major | Milestone: | 0.7.0 |
| Component: | helenos-build/sparc64 | Version: | mainline |
| Keywords: | sun4v | Cc: | |
| Blocker for: | Depends on: | ||
| See also: | #324 |
Description
After mainline,1921, mainline,1922 and mainline,1923, HelenOS/sun4v can make it quite far into userspace initialization. As far as stability is concerned, the only problem seems to be around this area in fibril.c:
fibril_t *srcf = __tcb_get()->fibril_data;
if (stype != FIBRIL_FROM_DEAD) {
/* Save current state */
if (!context_save(&srcf->ctx)) {
if (serialization_count)
srcf->flags &= ~FIBRIL_SERIALIZED;
if (srcf->clean_after_me) { <========== HERE
/*
* Cleanup after the dead fibril from which we
* restored context here.
*/
void *stack = srcf->clean_after_me->stack; <=========== or HERE
if (stack) {
/*
* This check is necessary because a
* thread could have exited like a
* normal fibril using the
* FIBRIL_FROM_DEAD switch type. In that
* case, its fibril will not have the
* stack member filled.
*/
Either srcf→clean_after_me or srcf→clean_after_me→stack contain some garbage (unaligned or unmapped).
The corresponding disasm is here:
c6b4: 82 10 00 07 mov %g7, %g1
c6b8: c4 5f a8 7f ldx [ %fp + 0x87f ], %g2
c6bc: c2 58 60 08 ldx [ %g1 + 8 ], %g1
c6c0: 80 a0 a0 03 cmp %g2, 3
c6c4: 02 40 00 73 be,pn %icc, c890 <fibril_switch+0x250>
c6c8: c2 77 a7 f7 stx %g1, [ %fp + 0x7f7 ]
c6cc: 40 00 53 f5 call 216a0 <context_save>
c6d0: 90 00 60 10 add %g1, 0x10, %o0
c6d4: 80 a2 20 00 cmp %o0, 0
c6d8: 12 40 00 a1 bne,pn %icc, c95c <fibril_switch+0x31c>
c6dc: 03 00 00 00 sethi %hi(0), %g1
c6e0: 82 18 7f e8 xor %g1, -24, %g1
c6e4: c2 01 c0 01 ld [ %g7 + %g1 ], %g1
c6e8: 80 a0 60 00 cmp %g1, 0
c6ec: 12 48 00 2f bne %icc, c7a8 <fibril_switch+0x168>
c6f0: c8 5f a7 f7 ldx [ %fp + 0x7f7 ], %g4
c6f4: ca 5f a7 f7 ldx [ %fp + 0x7f7 ], %g5
c6f8: fa 59 60 c8 ldx [ %g5 + 0xc8 ], %i5 <======== here %g5 is misaligned
c6fc: 22 c7 40 10 brz,a,pn %i5, c73c <fibril_switch+0xfc>
c700: b0 10 20 01 mov 1, %i0
c704: d0 5f 60 a8 ldx [ %i5 + 0xa8 ], %o0
c708: 02 c2 00 06 brz,pn %o0, c720 <fibril_switch+0xe0>
c70c: 01 00 00 00 nop
c710: 7f ff e8 b4 call 69e0 <as_area_destroy>
This crash can be still occasionally encountered also in the CHT pre-integration branch:
http://bazaar.launchpad.net/~jakub/helenos/cht-preintegration/revision/2291
Change History (1)
comment:1 by , 10 years ago
| Component: | helenos/srv/vfs → helenos-build/sparc64 |
|---|---|
| Resolution: | → fixed |
| Status: | new → closed |
Note:
See TracTickets
for help on using tickets.

There was a bug in tlb_invalidate_pages() fixed by mainline,2409 which was most likely causing this issue. As of mainline,2409, I was unable to reproduce the problem both under gem5 and on a real-world T1000.