Version 4 (modified by Jakub Jermář, 8 years ago) ( diff )

Add a brief conclusion.

Using QEMU and GDB to debug kernel and uspace tasks

Some of the debugging techniques and procedures described in this article are illustrated on an example involving the ia32 architecture. Other architectures should behave in a similar or analogous way, but dealing with potential differences is left to the reader as an exercise.

The combination of QEMU and GDB allows HelenOS to be comfortably debugged either on the assembly or the source code level. For detailed information on low-level debugging, see for example this course on crash dump analysis.

Preparing the build

The default HelenOS build should produce unstripped binaries. If necessary, this can be enforced by making sure the Strip binaries build configuration option is not checked. Unstripped binaries come with symbols, but do not contain any fancier debugging information. In order to get maximum out of the debug build, make sure to configure HelenOS with the Line debugging information option. Another thing which may impede debugging is optimization, so consider changing optimization levels to 0 using the OPTIMIZATION variable in the respective Makefiles (mainline/kernel/Makefile or mainline/uspace/Makefile.common). When everything is set, make sure to rebuild with the new settings.

Starting QEMU

QEMU provides two command line options for debugging with GDB: -s and -S. The former instructs QEMU to listen for GDB connections on localhost:1234 (but does not wait for it) and the latter stops the guest CPU at startup so that debugging is possible from the very beginning. When starting emulation using the mainline/tools/ script, one can add these options like this:

$ `tools/ -d` -s -S

Connecting GDB to QEMU

Once QEMU is started with the -s (and optionally also the -S) option, it is possible to connect GDB to it. For our purposes, we will assume the respective cross-GDB built by the mainline/tools/ is ready to be used and named as /usr/local/cross/ia32/bin/i686-pc-linux-gnu-gdb. Needless to say, the cross-GDB should always match the architecture of the HelenOS guest.

$ /usr/local/cross/ia32/bin/i686-pc-linux-gnu-gdb
GNU gdb (GDB) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=i686-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
Find the GDB manual and other documentation resources online at:
For help, type "help".
Type "apropos word" to search for commands related to "word".

Connect to QEMU using the following command at the (gdb) prompt:

(gdb) target remote :1234
Remote debugging using :1234
0x0000fff0 in ?? ()

Note that if QEMU was not started with the -S option, you will have to manually break into the debugger by pressing Ctrl-C.

Loading symbols

Depending on what is the subject of our debugging session, we will need some symbols. The easiest case is kernel debugging. In that case, we simply load the kernel symbols from kernel.raw:

(gdb) symbol-file kernel/kernel.raw 
Reading symbols from kernel/kernel.raw...done.

Debugging userspace tasks is a little more complicated because unlike the kernel, there will be multiple user processes running at the same time, so setting a mere breakpoint on a user address will probably not do. We will need to get some assistance from the kernel.

Setting and hitting breakpoints

To set a breakpoint, for example on the kernel function that handles invalid memory accesses from userspace, we type the following command:

(gdb) break fault_from_uspace_core
Breakpoint 1 at 0x801237ff: file generic/src/interrupt/interrupt.c, line 169.

Sadly the set breakpoint is not always hit. At this point, the cause of this behavior is unknown.

To continue the emulation, tell GDB to continue:

(gbd) c

When our breakpoint is later hit, for example as a result of executing inside HelenOS:

# tester fault1

we will break into the debugger prompt again:

Breakpoint 1, fault_from_uspace_core (istate=0x86d47fb4, fmt=fmt@entry=0x8016f3ce "Page fault: %p.", args=args@entry=0x86d47ec4 "\004") at generic/src/interrupt/interrupt.c:169
169	{

Note that at any time while the guest is running, you can break into the debugger also by pressing the Ctrl-C combo.

Now that a breakpoint in the kernel was hit, we can inspect the state of the kernel a little bit. Typing bt will show us the stack trace of the current kernel thread:

(gdb) bt
#0  fault_from_uspace_core (istate=0x86d47fb4, fmt=fmt@entry=0x8016f3ce "Page fault: %p.", args=args@entry=0x86d47ec4 "\004") at generic/src/interrupt/interrupt.c:169
#1  0x80123e2a in fault_if_from_uspace (istate=istate@entry=0x86d47fb4, fmt=fmt@entry=0x8016f3ce "Page fault: %p.") at generic/src/interrupt/interrupt.c:206
#2  0x80130bc9 in as_page_fault (address=4, access=PF_ACCESS_WRITE, istate=istate@entry=0x86d47fb4) at generic/src/mm/as.c:1497
#3  0x8010f978 in page_fault (n=14, istate=0x86d47fb4) at arch/ia32/src/mm/page.c:99
#4  0x80123bda in exc_dispatch (n=14, istate=0x86d47fb4) at generic/src/interrupt/interrupt.c:131
#5  0x8010ab62 in int_14 () at arch/ia32/src/asm.S:437
#6  0x0000000e in ?? ()

To see the information about the interrupted context, in this case the user context as it existed when the page fault exception occurred, we can print the istate structure:

(gdb) set radix 16
Input and output radices now set to decimal 16, hex 10, octal 20.
(gdb) p *istate
$2 = {edx = 0x80, ecx = 0x7, ebx = 0x7013cee0, esi = 0x70037f98, edi = 0x7001b784, ebp = 0x7013ce78, eax = 0x4, ebp_frame = 0x0, eip_frame = 0x3da7, gs = 0x30, fs = 0x23, es = 0x23, ds = 0x23, 
  error_word = 0x6, eip = 0x3da7, cs = 0x1b, eflags = 0x10202, esp = 0x7013ce78, ss = 0x23}

The printed eip member is the program counter of the instruction which caused the exception. We will remember this value along with the value of esp and ebp for later.

Useful macros

When debugging the kernel, it is sometimes useful to find out some information about the current process. For that, we will need to mimic the computation of the address of the THE structure. We will start by defining a macro:

(gdb) macro define the ((the_t *) ((uintptr_t )$esp & ~0x1fff))

From this time on, we can do things like:

(gdb) p the->task->name
$3 = "tester\000it\000\000\000\000\000\000\000\000\000\000"

Note that this will work only when $esp corresponds to the kernel stack.

Switching to the user context

Let us assume that we have somehow found the values of the userspace registers EIP, ESP and EBP (for example by inspecting the istate structure as shown above) and know the name of the process (for example tester from the example above). Before we can add symbol information for this process, we need to find out the load address of its .text section (for some reason, GDB does not use the information provided in the ELF file):

$ objdump -h uspace/app/tester/tester

uspace/app/tester/tester:     file format elf32-i386

Idx Name          Size      VMA       LMA       File off  Algn
  0 .init         0000002e  000010b4  000010b4  000000b4  2**0
  1 .text         0002f5a0  000010e8  000010e8  000000e8  2**3
  2 .data         00001b08  000316a0  000316a0  0002f6a0  2**5
                  CONTENTS, ALLOC, LOAD, DATA
  3 .tbss         00000048  000331a8  000331a8  000311a8  2**2
                  ALLOC, THREAD_LOCAL
  4 .bss          00000184  000331c0  000331c0  000311a8  2**5
  5 .comment      00000011  00000000  00000000  000311a8  2**0
                  CONTENTS, READONLY
  6 .debug_abbrev 0000878f  00000000  00000000  000311b9  2**0
  7 .debug_aranges 00000db8  00000000  00000000  00039948  2**3
  8 .debug_info   000293a5  00000000  00000000  0003a700  2**0
  9 .debug_line   0000cfc7  00000000  00000000  00063aa5  2**0
 10 .debug_ranges 00000358  00000000  00000000  00070a6c  2**0
 11 .debug_str    000078fc  00000000  00000000  00070dc4  2**0

So the .text section gets loaded at 0x10e8 in this case. We will use this address to load our userspace symbols:

(gdb) add-symbol-file uspace/app/tester/tester 0x000010e8
add symbol table from file "uspace/app/tester/tester" at
	.text_addr = 0x10e8
(y or n) y
Reading symbols from uspace/app/tester/tester...done.

The last step before we can print out our userspace stack trace is restoring registers to their userspace contents (as for example captured in the istate structure):

(gdb) set $eip=0x3da7
(gdb) set $esp=0x7013ce78
(gdb) set $ebp=0x7013ce78

We are now ready to do some userspace debugging:

(gdb) bt
#0  0x00003da7 in test_fault1 () at fault/fault1.c:34
#1  0x000010f6 in run_test (test=0x31770 <tests+208>) at tester.c:84
#2  0x00001374 in main (argc=0x43060, argv=0x0) at tester.c:161
#3  0x000134e3 in __main (pcb_ptr=0x7001b784) at generic/libc.c:121
#4  0x000010e2 in __entry () at arch/ia32/src/entry.S:69
#5  0x7001b784 in ?? ()
(gdb) disassemble
Dump of assembler code for function test_fault1:
   0x00003d9f <+0>:	push   %ebp
   0x00003da0 <+1>:	mov    %esp,%ebp
   0x00003da2 <+3>:	mov    $0x4,%eax
=> 0x00003da7 <+8>:	movl   $0x0,(%eax)
   0x00003dad <+14>:	mov    $0x2d973,%eax
   0x00003db2 <+19>:	pop    %ebp
   0x00003db3 <+20>:	ret    
End of assembler dump.

These two snippets above confirm that the page fault exception was indeed deliberately injected by the tester application when running test fault1.

Note: See TracWiki for help on using the wiki.