Opened 12 years ago

Closed 12 years ago

#379 closed defect (fixed)

Strange usbhub crash in async_get_call_timeout()

Reported by: Jakub Jermář Owned by:
Priority: major Milestone: 0.5.0
Component: helenos/drv/usbhub Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

As of default ia32 mainline,1259 UP, but not before mainline,1250, usbhub hits a strange assertion in the async framework:

ipc_callid_t async_get_call_timeout(ipc_call_t *call, suseconds_t usecs)
{
        assert(call);

It is strange because call usually, and in this case specifically, is an address on the stack, i.e. never NULL. amd64 in the same configuration does not reproduce the problem.

I am going to attach a few screenshots I took when the system crashed.

Attachments (5)

SAM_0340.JPG (1.1 MB ) - added by Jakub Jermář 12 years ago.
SAM_0341.JPG (1.1 MB ) - added by Jakub Jermář 12 years ago.
SAM_0342.JPG (1.1 MB ) - added by Jakub Jermář 12 years ago.
SAM_0343.JPG (1.1 MB ) - added by Jakub Jermář 12 years ago.
usbhub.disasm (2.0 MB ) - added by Jakub Jermář 12 years ago.

Change History (11)

by Jakub Jermář, 12 years ago

Attachment: SAM_0340.JPG added

by Jakub Jermář, 12 years ago

Attachment: SAM_0341.JPG added

by Jakub Jermář, 12 years ago

Attachment: SAM_0342.JPG added

by Jakub Jermář, 12 years ago

Attachment: SAM_0343.JPG added

by Jakub Jermář, 12 years ago

Attachment: usbhub.disasm added

comment:1 by Jakub Jermář, 12 years ago

From the stacktrace as displayed on one of the screenshots, we see that the place from where async_get_call_timeout() is called is this:

    9013:       8d 7d c0                lea    -0x40(%ebp),%edi     <= "&call"
    9016:       66 90                   xchg   %ax,%ax
    9018:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%esp)       <= no timeout
    901f:       00
    9020:       89 3c 24                mov    %edi,(%esp)
    9023:       e8 d8 d5 00 00          call   16600 <async_get_call_timeout>
    9028:       89 c6                   mov    %eax,%esi

And here is the beginning of async_get_call_timeout():

   16600:       55                      push   %ebp
   16601:       89 e5                   mov    %esp,%ebp
   16603:       57                      push   %edi
   16604:       56                      push   %esi
   16605:       53                      push   %ebx
   16606:       83 ec 2c                sub    $0x2c,%esp
   16609:       8b 75 08                mov    0x8(%ebp),%esi
   1660c:       85 f6                   test   %esi,%esi
   1660e:       0f 84 eb 01 00 00       je     167ff <async_get_call_timeout+0x1ff>
Last edited 12 years ago by Jakub Jermář (previous) (diff)

comment:2 by Jakub Jermář, 12 years ago

This is beginning to look a lot like corrupted register EDI. I have built uspace with -O[0-3] and in all cases except of -O0, I can see that at one point there was a non-zero / valid preserved register EDI and several instructions later, usbhub crashed because of 0 in or propagated from EDI:

  • with -O3, EDI is computed as EBP - 40 (see 379#comment:1), a guaranteed non-zero
  • with -O1 and -O2, EDI holds a non-zero address of dev in driver_dev_add() and we crash in list_append() when accessing dev->link

In the former case, there are no interleaved function calls, which makes one think that EDI is corrupted because of the kernel preemption somehow. I was not able to spot anything suspicious in the ia32 preemption code though.

Another mystery is how is this related to mainline,1250 and why we don't see other crashes happening also to other tasks besides usbhub. Interestingly, mainline,1250, touched usbhub code a lot.

comment:3 by Jakub Jermář, 12 years ago

There is a loop in driver_connection() that I originally missed. Considering the loop, we could assume the EDI register was clobbered somewhere in usb_hub_device_add() called from driver_dev_add().

comment:4 by Jakub Jermář, 12 years ago

At last, I found it. In usb_hub_process_hub_specific_info() there is:

        usb_hub_descriptor_header_t descriptor;
        size_t received_size;
        int opResult = usb_request_get_descriptor(control_pipe,
            USB_REQUEST_TYPE_CLASS, USB_REQUEST_RECIPIENT_DEVICE,
            USB_DESCTYPE_HUB, 0, 0, &descriptor,
            sizeof(usb_hub_descriptor_t), &received_size);

descriptor is a usb_hub_descriptor_header_t, but we pass sizeof(sizeof(usb_hub_descriptor_t) to usb_request_get_descriptor(). These two arguments have the meaning of buffer and buffer size in the called function.

The following patch fixes the problem for me, but I am not sure whether we should not rather make descriptor a usb_hub_descriptor_t:

=== modified file 'uspace/drv/bus/usb/usbhub/usbhub.c'
--- uspace/drv/bus/usb/usbhub/usbhub.c	2011-09-27 18:39:12 +0000
+++ uspace/drv/bus/usb/usbhub/usbhub.c	2011-10-12 11:52:48 +0000
@@ -241,7 +241,7 @@
 	int opResult = usb_request_get_descriptor(control_pipe,
 	    USB_REQUEST_TYPE_CLASS, USB_REQUEST_RECIPIENT_DEVICE,
 	    USB_DESCTYPE_HUB, 0, 0, &descriptor,
-	    sizeof(usb_hub_descriptor_t), &received_size);
+	    sizeof(usb_hub_descriptor_header_t), &received_size);
 	if (opResult != EOK) {
 		usb_log_error("Failed to receive hub descriptor: %s.\n",
 		    str_error(opResult));

comment:5 by Jakub Jermář, 12 years ago

On 10/12/2011 02:53 PM, Jan Vesely wrote:
> only data from the first 7 bytes are used, so asking for
> sizeof(usb_hub_descriptor_header_t)
> is enough, this is a leftover from usbhub rewrite, sorry

comment:6 by Jakub Jermář, 12 years ago

Resolution: fixed
Status: newclosed

Fixed in mainline,1271.

Note: See TracTickets for help on using tickets.