Opened 2 years ago

Closed 22 months ago

#840 closed defect (fixed)

Network link not coming up in VirtualBox (Intel Pro/1000)

Reported by: Jiri Svoboda Owned by:
Priority: major Milestone: 0.12.1
Component: helenos/unspecified Version: mainline
Keywords: virtualbox Cc:
Blocker for: Depends on:
See also:

Description

When running HelenOS in VirtualBox, HelenOS DHCP client fails to obtain an address (dnscfg prints "Nameserver: none") and link state is shown as down.

VirtualBox can emulate two different PCnet adapters (not supported by HelenOS) and VirtIO-net (not supported by HelenOS). It can also emulate three different Intel Pro/1000. If I use one of the two MT models, I get symptoms as above. For the other, T model, I don't get a NIC instance.

It seems currently there is no VirtualBox configuration in which we could get networking to work. Also note that the default network adapter for OS Other - Other/Unknown is PCnet-FAST III, which we don't support at all.

Change History (18)

comment:1 by Jiri Svoboda, 2 years ago

Keywords: virtualbox added

comment:2 by Jiri Svoboda, 2 years ago

Milestone: 0.12.1

comment:3 by Jiri Svoboda, 2 years ago

I tested this again with latest HelenOS and VirtualBox 6.1. Not sure why, I am getting a link 'up' for both MT adapters (was it really down before?), but DHCP times out trying to obtain an address / DNS server. Here's the detailed results:

Attached to: NAT
Adapter Type: Intel PRO/10000 MT Desktop (82540EM)
# nic
...
Link state: up
Broadcast receive mode: accepted

log/dhcp.log:
[dhcp] note: net/eth1: dhcpsrv_request_timeout
...
[dhcp] note: Retries exhausted
[dhcp] note: Giving up on link net/eth1
Attached to: NAT
Adapter Type: Intel PRO/10000 T Server (82543GC)
# nic
[no adapter]
Attached to: NAT
Adapter Type: Intel PRO/10000 MT Server (82545EM)


# nic
...
Link state: up
Broadcast receive mode: accepted

log/dhcp.log:
[dhcp] note: net/eth1: dhcpsrv_request_timeout
...
[dhcp] note: Retries exhausted
[dhcp] note: Giving up on link net/eth1

comment:4 by Jiri Svoboda, 2 years ago

I tried Haiku in VirtualBox 6.1, with NAT networking. everything works with any of the VirtualBox-provided network adapters (both PCnet, all three Intel, Virtio-net). In my case, the VM received IP address 10.0.

Looking at Haiku's /var/log/syslog, the DHCP daemon succeeded to receive address (10.0.2.15), subnet (255.255.255.0), GW (10.0.2.2), nameserver (10.0.2.3) from DHCP server at 10.0.2.2.

comment:5 by Jiri Svoboda, 2 years ago

If I manually configure an IPv4 address in HelenOS (with Intel 82540EM) 10.0.2.15, I can successfully ping 10.0.2.2 and 10.0.2.3(!)

This means the NIC is transmitting/receiving frames, IP works. It thus seems just DHCP (or possibly broadcast) is not working.

comment:6 by Jiri Svoboda, 2 years ago

As noted above I can ping 10.0.2.3 (which should be the DNS nameserver address), but if I manually configure it (dnscfg set-ns 10.0.2.3), and try to resolve an address, it waits for a bit, then times out with an error.

comment:7 by Jiri Svoboda, 2 years ago

If I manually configure a default route

/ # inet create-sr 0.0.0.0/0 10.0.2.2 default

then ping helenos.org:

/ # ping 82.208.58.129

it works! Looks like ICMP works just fine, but we might be perhaps having trouble with UDP and/or TCP.

comment:8 by Jiri Svoboda, 2 years ago

When I use IP address of helenos.org and use the download tool:

/ # download http://82.208.58.129/
Server returned status 403 Forbidden
/ #

It worked! The server returned an error because we didn't supply the correct Host field in the HTTP request.

This means TCP works. That narrows down the problem to just UDP (more likely) or DNS+DHCP (less likely).

comment:9 by Jiri Svoboda, 2 years ago

After fixing a problem in netecho (not being able to send any messages), I tried the following:

On the Linux host I started ncat -l -u 1234 (listen on UDP port 1234). Then tried sending messages from HelenOS in VirtualBox using # netecho -d <host-address>:1234 and it worked!

That means UDP works. Just DNS and DHCP do not.

comment:10 by Jiri Svoboda, 2 years ago

I removed nconfsrv from init, that means we can intervene before starting DHCP negotiation. Then we can do

# logset dhcp debug2
# /srv/net/nconfsrv

Looking at the log we can see:

  • We send DHCPDISCOVER
  • We receive offer (address 10.0.2.15/24, router 10.0.2.2, DNS 127.0.0.53, …)
  • We send DHCPREQUEST
  • We time out waiting for the answer

comment:11 by Jiri Svoboda, 2 years ago

I enabled logging in udp and inetsrv and verified that the DHCPACK is not seen on UDP or IP layer in HelenOS, meaning the DHCP server probably did not respond to our DHCPREQUEST (as opposed to it responding but us dropping the message).

So we need to figure out why the DHCP server in VirtualBox is okay with our DHCPDISCOVER but does not like our DHCPREQUEST.

comment:12 by Jiri Svoboda, 2 years ago

The problem is HelenOS DHCP client was setting ciaddr in the DHCP request header. ciaddr is to be filled in with the current IP address when renewing it. When requesting an address for the first time, we are in SELECTING state and RFC 2131 states in section 4.3.2:

  o DHCPREQUEST generated during SELECTING state:

      Client inserts the address of the selected server in 'server
      identifier', 'ciaddr' MUST be zero, 'requested IP address' MUST be
      filled in with the yiaddr value from the chosen DHCPOFFER.

It seems Qemu's DHCP server is tolerant here, but VirtualBox's is not.

Fixed this in changeset af259da6cd1876ab810c671932715fd43fabdc48.

Now we succesffully get IP address, subnet mask, DNS server and default gateway from DHCP:

  • Address 10.0.2.15/24
  • Router 10.0.2.2
  • DNS server 127.0.0.53

Still DNS does not work.
The address 127.0.0.53 looks very suspicious. Considering that 53 is the code for DHCP option 'DHCP message type' I have a hunch that we did not parse the options in the DHCPOFFER/DHCPACK correctly.

If I manually set the DNS server address to 10.0.2.3, DNS still does not work. So I guess we have yet another problem there.

comment:13 by Jiri Svoboda, 2 years ago

Sending DNS queries to 127.0.0.53 isn't working because based on loopback network link address 127.0.0.1/24 it gets sent down loopback link, comes back and then gets dropped (because we don't have address 127.0.0.53)….

When I manually set dns server address to 10.0.2.3 and enable debug2 on inetsrv I can see that the DNS requests are being sent, but nothing comes back.

I dumped the DNS requests and they are byte-for-byte same as those sent by 'getent hosts xxx' in Linux.

Looking at my host's /etc/resolv.conf I can see where the 127.0.0.53 came from. 'nameserver 127.0.0.53' this looks like systemd's local name server.

Now I don't understand why VirtualBox passed this address to HelenOS via DHCP. If I run Haiku in VirtualBox in practically the same configuration, it displays 10.0.2.3 as the DNS server address (and works correctly).

comment:14 by Jiri Svoboda, 2 years ago

If I edit my host's /etc/resolv.conf and put in 'nameserver 192.168.0.1' - the IP address of my Wireless router, I can still resolve host names from Linux. In HelenOS/VirtualBox I now get 192.168.0.1 as the DNS server address via DHCP(!). I can ping this address. At first it seemed like DNS requests still did not work, but then it started to work. Now I can resolve host names from within HelenOS/VB, where nameserver in HelenOS is configured to 192.168.0.1(!) This shows UDP/ bidirectional NAT works as expected.

It would be easy to blame this strange behavior on systemd+VirtualBox, but it does not explain why it only happens with HelenOS, but not with Haiku (Linux, etc). Why Haiku gets DNS address 10.0.2.3 and HelenOS gets a different one?

comment:15 by Jiri Svoboda, 22 months ago

I dumped the DHCP server responses and the incriminating address 127.0.0.53 appears both in DHCPOFFER and DHCPACK. Thus the problem occurs as soon as we send DHCPDISCOVER.

I was looking at the differences between a DHCP discover messages sent by HelenOS vs. Linux. I couldn't find anything obviously wrong with HelenOS messages, but there are some differences.

Here's a Wireshark dump of Linux discover message:

Dynamic Host Configuration Protocol (Discover)
    Message type: Boot Request (1)
    Hardware type: Ethernet (0x01)
    Hardware address length: 6
    Hops: 0
    Transaction ID: 0xa55e2e70
    Seconds elapsed: 2
    Bootp flags: 0x0000 (Unicast)
    Client IP address: 0.0.0.0
    Your (client) IP address: 0.0.0.0
    Next server IP address: 0.0.0.0
    Relay agent IP address: 0.0.0.0
    Client MAC address: LCFCHeFe_b8:e0:2d (50:7b:9d:b8:e0:2d)
    Client hardware address padding: 00000000000000000000
    Server host name not given
    Boot file name not given
    Magic cookie: DHCP
    Option: (53) DHCP Message Type (Discover)
    Option: (61) Client identifier
        Length: 7
        Hardware type: Ethernet (0x01)
        Client MAC address: LCFCHeFe_b8:e0:2d (50:7b:9d:b8:e0:2d)
    Option: (55) Parameter Request List
    Option: (57) Maximum DHCP Message Size
    Option: (50) Requested IP Address (10.163.47.226)
    Option: (12) Host Name
    Option: (255) End

Differences with HelenOS:

  • Transaction ID is not random (it is always 42)
  • Boot flags: 0x8000 (broadcast)
  • Seconds elapsed is always 0
  • The only option used is 53 (message type)

I modified the transaction ID to use a random number generator and also hacked inetsrv to accept all packets regardless of target IP address so that I could try boot flags 0 / unicast, nothing had any effect on the problem. It seems this is not it.

So currently it's looking like what's triggering the problem could be the absence of one of the DHCP options:

  • client identifier (61)
  • parameter request list (55)
  • maximum dhcp message size (57)
  • requested IP address
  • host name

comment:16 by Jiri Svoboda, 22 months ago

Okay, here's the root cause.

If parameter request list is not provided, or it is provided and Domain Name Server (6) is not listed, then the problem occurs and VirtualBox provides the incorrect DNS server address.

This is VirtualBox 6.1.

comment:17 by Jiri Svoboda, 22 months ago

I filed a bug with VirtualBox.

Apart from that, HelenOS should clearly request DNS server address if it wants to use it, so it's an easy fix.

comment:18 by Jiri Svoboda, 22 months ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.