Opened 14 years ago

Closed 14 years ago

#227 closed defect (fixed)

HelenOS sometimes hard hangs shortly after boot

Reported by: Jakub Jermář Owned by: Jakub Jermář
Priority: critical Milestone: 0.4.3
Component: helenos/kernel/ia32 Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

This is reproducible for instance in revision 402. The system very rarely hangs after most of or all vc console icons turn red. The system looks entirely hung, no error message is displayed even by the kernel (I modified the kconsole printing code to ignore the silent variable). I think this happened to me several times on a UP system, 4-processor and 7-processor systems.

Attachments (2)

qemu.log (67.1 KB ) - added by Jakub Jermář 14 years ago.
The suspicious part of QEMU trace.
kernel.raw.bz2 (155.9 KB ) - added by Jakub Jermář 14 years ago.
Kernel image with symbols matching the attached QEMU trace.

Download all attachments as: .zip

Change History (14)

comment:1 by Jakub Jermář, 14 years ago

Component: unspecifiedkernel/ia32

comment:2 by Jakub Jermář, 14 years ago

According to Qemu logs, it turns out the hang can be caused by a triple fault. Now, this finally something useful. I will make an update when I learn something more.

comment:3 by Jakub Jermář, 14 years ago

In the following analysis, I am using:

  • QEMU PC emulator version 0.12.3, Copyright © 2003-2008 Fabrice Bellard
  • C native compiler for building QEMU gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu9)
  • C cross compiler for building HelenOS gcc version 4.4.3 (GCC) (built by toolchain.sh)
  • HelenOS, ia32 defaults, head,418

The command to run HelenOS:

  • qemu -cdrom image.iso -smp 7 -d int,cpu_reset

After about ten invocations, I got the hang. Note that I got similar hangs (also caused by a triple fault), in a more reproducible way, by running top and then several instances of tasks -c until the system hung. Also note that I reproduced the same hang (likely) also on HelenOS 0.4.1, which should rule out some recent changes.

I am attaching the relevant portion of qemu.log from the point were things seems to go wrong until we hit the triple fault. I am also going to attach the respective kernel.raw from were symbols, addresses and instructions can be recovered by e.g. objdump.

by Jakub Jermář, 14 years ago

Attachment: qemu.log added

The suspicious part of QEMU trace.

by Jakub Jermář, 14 years ago

Attachment: kernel.raw.bz2 added

Kernel image with symbols matching the attached QEMU trace.

comment:4 by Jakub Jermář, 14 years ago

From the attached qemu.log, these are the interesting exceptions restricted to the suspicious address space CR3=0x65000:

166734
the processor is executing scheduler_separated_stack() (EIP=0x8011df75) and receives IRQ 0 (v=0x20), i.e. the clock interrupt. More precisely the place location in scheduler_separated_stack() corresponds to the piece of find_best_thread() where the scheduler halts the processor and waits for an interrupt to continue.
166734: v=20 e=0000 i=0 cpl=0 IP=0008:8011df75 pc=8011df75 SP=0010:8001bf8c EAX=00200046
EAX=00200046 EBX=fffff000 ECX=00000000 EDX=00000001
ESI=00000000 EDI=8005f540 EBP=8001bff4 ESP=8001bf8c
EIP=8011df75 EFL=00200202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9800 DPL=0 CS32 [---]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
GS =0030 00010188 ffffffff 00cff301 DPL=3 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0028 80040000 00000068 80408904 DPL=0 TSS32-avl
GDT=     8008b140 00000040
IDT=     80174560 00000200
CR0=80000019 CR2=0000b000 CR3=00065000 CR4=00000610
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000044 CCD=00000000 CCO=EFLAGS
166735
here the processor gets a page fault exception in an attempt to execute code on EIP=0x8011e7ab (i.e. the page with this EIP appears to be currently not mapped), which is the beginning of thread_exists(). This is already very strange because the kernel uses identity mapping for the entire physical memory, including the kernel text.
check_exception old: 0xffffffff new 0xe
166735: v=0e e=0000 i=0 cpl=0 IP=0008:8011e7ab pc=8011e7ab SP=0010:8001be40 CR2=8011e7ab
EAX=00000000 EBX=8012e591 ECX=00000000 EDX=800228c0
ESI=80022970 EDI=fffff000 EBP=8001be8c ESP=8001be40
EIP=8011e7ab EFL=00200002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9800 DPL=0 CS32 [---]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
GS =0030 00010188 ffffffff 00cff301 DPL=3 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0028 80040000 00000068 80408904 DPL=0 TSS32-avl
GDT=     8008b140 00000040
IDT=     80174560 00000200
CR0=80000019 CR2=8011e7ab CR3=00065000 CR4=00000610
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=0000002c CCD=8001be30 CCO=ADDL    
166750
the kernel gets a page fault exception when executing the first instruction of as_page_fault() (EIP=0x80126ab2), in an attempt to handle the previous exception (166735). Again, this should not happen because as_page_fault() is normally identity-mapped. Note ESP=0x8001bdc4.
166750: v=0e e=0000 i=0 cpl=0 IP=0008:80126ab2 pc=80126ab2 SP=0010:8001bdc4 CR2=80126ab2
EAX=00000000 EBX=8001be10 ECX=00000000 EDX=fffff000
ESI=8011e7ab EDI=fffff000 EBP=8001bde0 ESP=8001bdc4
EIP=80126ab2 EFL=00200002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9800 DPL=0 CS32 [---]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
GS =0030 00010188 ffffffff 00cff301 DPL=3 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0028 80040000 00000068 80408904 DPL=0 TSS32-avl
GDT=     8008b140 00000040
IDT=     80174560 00000200
CR0=80000019 CR2=80126ab2 CR3=00065000 CR4=00000610
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000010 CCD=00000000 CCO=LOGICL
166751 - 166795
each is an attempt to handle the previous one, analogous to 166750, except that every single one of these exceptions consume a little bit of the stack. During the last one, 166795, the ESP is only 0x8001b034.
check_exception old: 0xffffffff new 0xe
166795: v=0e e=0000 i=0 cpl=0 IP=0008:80126ab2 pc=80126ab2 SP=0010:8001b034 CR2=80126ab2
EAX=00000000 EBX=8001b080 ECX=00000000 EDX=fffff000
ESI=80126ab2 EDI=fffff000 EBP=8001b050 ESP=8001b034
EIP=80126ab2 EFL=00200002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9800 DPL=0 CS32 [---]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
GS =0030 00010188 ffffffff 00cff301 DPL=3 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0028 80040000 00000068 80408904 DPL=0 TSS32-avl
GDT=     8008b140 00000040
IDT=     80174560 00000200
CR0=80000019 CR2=80126ab2 CR3=00065000 CR4=00000610
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000010 CCD=00000000 CCO=LOGICL
166796
here we are in the page fault exception handler (EIP=0x8010a595), trying to push the constant 0xe on the stack. ESP is now 0x8001b000, which means that we are trying to access memory at 0x8001affc, which is already a different page. So thanks to the above recursion in exception handling, we have just overflown the original stack space, which on HelenOS/ia32 is 4KiB. Moreover, we hit a page, which is not mapped, which again is strange thanks to the identity mapping that should be in place.
check_exception old: 0xffffffff new 0xe
166796: v=0e e=0002 i=0 cpl=0 IP=0008:8010a595 pc=8010a595 SP=0010:8001b000 CR2=8001affc
EAX=00000010 EBX=8001b080 ECX=00000000 EDX=fffff000
ESI=80126ab2 EDI=fffff000 EBP=00000000 ESP=8001b000
EIP=8010a595 EFL=00200002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9800 DPL=0 CS32 [---]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
GS =0030 00010188 ffffffff 00cff301 DPL=3 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0028 80040000 00000068 80408904 DPL=0 TSS32-avl
GDT=     8008b140 00000040
IDT=     80174560 00000200
CR0=80000019 CR2=8001affc CR3=00065000 CR4=00000610
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000010 CCD=00000000 CCO=LOGICL  
166797
here the processor encountered a double fault exception in an attempt to save previous execution state on the privileged stack; this is understandable because from the previous exception we now that the stack is not mapped.
check_exception old: 0xe new 0xe
166797: v=08 e=0000 i=0 cpl=0 IP=0008:8010a595 pc=8010a595 SP=0010:8001b000 EAX=00000010
EAX=00000010 EBX=8001b080 ECX=00000000 EDX=fffff000
ESI=80126ab2 EDI=fffff000 EBP=00000000 ESP=8001b000
EIP=8010a595 EFL=00200002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00cf9800 DPL=0 CS32 [---]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
FS =0023 00000000 ffffffff 00cff300 DPL=3 DS   [-WA]
GS =0030 00010188 ffffffff 00cff301 DPL=3 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0028 80040000 00000068 80408904 DPL=0 TSS32-avl
GDT=     8008b140 00000040
IDT=     80174560 00000200
CR0=80000019 CR2=8001affc CR3=00065000 CR4=00000610
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
CCS=00000010 CCD=00000000 CCO=LOGICL  

The double fault could not have been handled either, so the processor hit a triple fault and performed a reset. This is were the operating system lost control and started to appear hung.

comment:5 by Jakub Jermář, 14 years ago

We should be asking ourselves why are some parts of the kernel address space (rooted in PTL0 0x65000) not mapped. Had they been mapped as usual, the system would be fine.

comment:6 by Jakub Jermář, 14 years ago

Owner: set to Jakub Jermář
Status: newaccepted

I think I know who the culprit is.

The page tables are corrupted, because they were freed by the address space. The problem is that when the last thread within a task exits, the scheduler calls thread_destroy(), which in turn calls task_destroy(), which in turn calls as_destroy(). On architectures with hierarchical page tables, as_destroy() destroys the page tables, but the kernel continues to use them until another address space is installed.

I verified this by printing a debug message after page tables rooted at 0x65000 (this address gets regularly assigned to one of the initial address spaces) are destroyed. When I reproduced the issue, the message appeared at the same time the system appeared hung. qemu.log pointed to 0x65000 too, which can be considered a conclusive evidence.

comment:7 by Jakub Jermář, 14 years ago

Priority: majorcritical

comment:8 by Jakub Jermář, 14 years ago

The issue is 100% reproducible with the following change:

=== modified file 'kernel/generic/src/mm/as.c'
--- kernel/generic/src/mm/as.c	2010-03-23 14:41:06 +0000
+++ kernel/generic/src/mm/as.c	2010-05-01 09:59:25 +0000
@@ -249,6 +249,7 @@
 
 	btree_destroy(&as->as_area_btree);
 #ifdef AS_PAGE_TABLE
+	SET_PTL0_ADDRESS(as->genarch.page_table);
 	page_table_destroy(as->genarch.page_table);
 #else
 	page_table_destroy(NULL);

On ia32, SET_PTL0_ADDRESS has the side effect of invalidating the TLBs so the bug is much more likely to show. With this change, Qemu with one processor reboots early after the kernel banner is printed. Qemu with more CPUs hard hangs at about the same point.

in reply to:  8 comment:9 by Jakub Jermář, 14 years ago

Replying to jermar:

The issue is 100% reproducible with the following change:

=== modified file 'kernel/generic/src/mm/as.c'
--- kernel/generic/src/mm/as.c	2010-03-23 14:41:06 +0000
+++ kernel/generic/src/mm/as.c	2010-05-01 09:59:25 +0000
@@ -249,6 +249,7 @@
 
 	btree_destroy(&as->as_area_btree);
 #ifdef AS_PAGE_TABLE
+	SET_PTL0_ADDRESS(as->genarch.page_table);
 	page_table_destroy(as->genarch.page_table);
 #else
 	page_table_destroy(NULL);

On ia32, SET_PTL0_ADDRESS has the side effect of invalidating the TLBs so the bug is much more likely to show. With this change, Qemu with one processor reboots early after the kernel banner is printed. Qemu with more CPUs hard hangs at about the same point.

This is actually a flawed example. Instead, something like the following can be used to reproduce:

 	btree_destroy(&as->as_area_btree);
 #ifdef AS_PAGE_TABLE
 	page_table_destroy(as->genarch.page_table);
+	memsetb((void *)PA2KA(as->genarch.page_table), PAGE_SIZE, 0);
+	if (as == AS) {
+		SET_PTL0_ADDRESS(as->genarch.page_table);
+	}
 #else
 	page_table_destroy(NULL);
 #endif

comment:10 by Jakub Jermář, 14 years ago

There is also another similar scenario, which can lead to the observed triple fault. Imagine a thread running on cpu0. When the thread is preempted or goes to sleep, find_best_thread() until as_switch() will continue to run using its address space and CR3. But if the thread gets migrated to cpu1 and exits there, it's address space will be deallocated. But cpu0 may be still using it.

comment:11 by Jakub Jermář, 14 years ago

Fixed in changeset:head,421.

comment:12 by Jakub Jermář, 14 years ago

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.