Extremely Low Network Performance in Genode (Approx. 1/10th of Linux)

Hello Genode community,

I’m seeking guidance on an unexpectedly low network performance issue encountered while running Genode in a virtualized environment. My current observations show Genode achieving only about one-tenth (1/10) of the network throughput compared to Linux under identical setup conditions.

I would appreciate any insights on potential bottlenecks or best practices for high-speed networking within Genode, as my goal is to understand what is limiting the performance.

Test Setup and Results

I conducted the following experiments on a bare-metal x86 server with 32 CPU cores and 64GB of RAM. The virtualization platform used was QEMU (x86_64) without KVM acceleration.

For all guests, I allocated 8 CPU cores and 16GB of RAM. The test involves a simple, large-block network copy program (transferring a buffer of like 128MB).

1. Cross-VM Network Test (Client → Server)

I ran two separate VMs side-by-side:

Setup Throughput Result Comparison
Linux (archiso) 71077 KB/s (Baseline)
Genode 3750 KB/s 5.3% of Linux’s speed

The performance difference between the two operating systems is significant.

2. Single-VM Localhost Test (Intra-VM Networking)

To isolate the issue from cross-VM network emulation overhead, I ran both the client and server components within a single Genode VM (localhost communication, similar to ip_raw.inc).

Setup Throughput Result
Single Genode VM (Localhost) 6547 KB/s

This result, while better than the cross-VM test, is still surprisingly low for an internal loopback transfer of 128MB.

Code Implementation Details

The core transfer mechanism uses the following structure. net::Socket4 is a minimal wrapper around standard libc socket functions (read/write).

The large transfer is static const long TEST_SIZE = 128 * 1024L * 1024;

My client code:


static const long TEST_SIZE = 128 * 1024L * 1024;
static const long BUF_SIZE = TEST_SIZE;
static char buf[BUF_SIZE];


void localhostNetCli(Libc::Env& env) {
    Genode::Heap heap { env.ram(), env.rm() };

    adl::defaultAllocator.init({

        .alloc = [] (adl::size_t size, void* data) {
            return reinterpret_cast<Genode::Heap*>(data)->alloc(size);
        },
        
        .free = [] (void* addr, adl::size_t size, void* data) {
            reinterpret_cast<Genode::Heap*>(data)->free(addr, size);
        },
        
        .data = &heap
    });


    for (long i = 0; i < BUF_SIZE; i++) {
        buf[i] = char(i % 256);
    }


    Timer::Connection timer {env};


    net::Socket4 sock;
    sock.ip.set("10.0.3.3");
    sock.port = 12000;

    // wait..
    Timer::Connection sleepTimer {env};
    sleepTimer.msleep(6000);


    if (sock.connect() != monkey::Status::SUCCESS) {
        MONKEY_LOG_ERROR("Socket connect error (10.0.3.3).");

        sock.ip.set("10.0.3.2");
        if (sock.connect() != monkey::Status::SUCCESS) {
            MONKEY_LOG_ERROR("Socket connect error (10.0.3.2).");
            MONKEY_LOG_ERROR("------ CRITICAL ------");
            return;
        }
    }



    auto beg = timer.curr_time();

    sock.send(buf, TEST_SIZE);

    char recvBuf[20];
    sock.recv(recvBuf, 2);

    auto end = timer.curr_time();
    auto durationMSecs = end.trunc_to_plain_ms().value - beg.trunc_to_plain_ms().value;
    MONKEY_LOG_INFO("rate: ", (BUF_SIZE) * 1000L / (1024) / durationMSecs, " KB/s");
}



void Libc::Component::construct(Libc::Env& env) {
    Libc::with_libc([&] () {
        localhostNetCli(env);
    });
    return;

}

My server code:

static const long TEST_SIZE = 128 * 1024L * 1024;
static const long BUF_SIZE = TEST_SIZE;
static char buf[BUF_SIZE];

void localhostNetSvr(Libc::Env& env) {
    Genode::Heap heap { env.ram(), env.rm() };

    adl::defaultAllocator.init({

        .alloc = [] (adl::size_t size, void* data) {
            return reinterpret_cast<Genode::Heap*>(data)->alloc(size);
        },
        
        .free = [] (void* addr, adl::size_t size, void* data) {
            reinterpret_cast<Genode::Heap*>(data)->free(addr, size);
        },
        
        .data = &heap
    });


    net::Socket4 listenSocket;
    listenSocket.ip.set("0.0.0.0");
    listenSocket.port = 12000;

    listenSocket.start();

    net::Socket4 client = listenSocket.accept();

    Genode::warning(__func__, " accepted client.");

    Timer::Connection timer {env};
    auto beg = timer.curr_time();

    client.recv(buf, TEST_SIZE);

    auto end = timer.curr_time();

    client.send("OK", 2);
    client.close();

    auto durationMSecs = end.trunc_to_plain_ms().value - beg.trunc_to_plain_ms().value;
    MONKEY_LOG_INFO("rate: ", (TEST_SIZE) * 1000L / (1024) / durationMSecs, " KB/s");
}


void Libc::Component::construct(Libc::Env& env) {
    Libc::with_libc([&] () {
        localhostNetSvr(env);
    });
    return;
}

net::Socket4 is just a simple wrapper of libc socket, like:

adl::int64_t PromisedSocketIo::recv(void* buf, adl::size_t len) {
    adl::size_t sum = 0;
    while (sum < len) {
        adl::int64_t curr = read(socketFd, ((char*) buf) + sum, len - sum);

        if (curr <= 0) {
            return -1;
        }

        sum += size_t(curr);
    }
    
    return adl::int64_t(sum);
}

adl::int64_t PromisedSocketIo::send(const void* buf, adl::size_t len) {
    return ::write(socketFd, buf, len);
}

Question

Given that the issue persists even in the localhost test, it suggests the bottleneck might be within the Genode’s internal mechanism:

  • The IPC overhead required for network stack (lwIP) or socket communication.

  • The network stack implementation/efficiency itself.

  • The overhead associated with large read/write calls in the VFS/libc layers.

  • The performance impact of QEMU TCG emulation on microkernel interactions.

I need to profile and identify the specific bottleneck. Could anyone advise on the most likely areas for network performance bottlenecks in Genode, especially under TCG emulation, and how to effectively profile/debug them?

Thank you for your assistance.

Can you please share following information:

  • which ethernet device did you configure for qemu
  • which ethernet driver you are using, we have two sources (ipxe and linux)
  • how did you use priorities, it may have bad impact if not used at all or sub-optimal
  • which microkernel do you use, they may behave differently regarding scheduling and priorities

Just as further note, regarding multiple CPUs, even so it is potentially currently not relevant to you. If you want to utilize the multiple vCPUs you configure, you have to do it explicitly, e.g. creating per CPU a thread and do some useful work there. There is no automatic shuffling of threads across CPUs - none of the microkernels support that - but Linux of course does that. That you should have in mind when using multiple CPUs.

Thanks for your reply!

I haven’t manually changed my ethernet device, and it looks like the system is using e1000. The network driver seems to be lwIP? My own program’s priority is -1. I’m using the nova kernel.

here is my .run file:

#
# Build
#

create_boot_directory

import_from_depot [depot_user]/src/[base_src] \
                  [depot_user]/pkg/[drivers_nic_pkg] \
                  [depot_user]/src/init \
                  [depot_user]/src/libc \
                  [depot_user]/src/nic_router \
                  [depot_user]/src/vfs \
                  [depot_user]/src/vfs_lwip \
                  [depot_user]/src/vfs_pipe


build {
    core init timer app/monkey_concierge
}


#
# Generate config
#

install_config {

<config verbose="yes" prio_levels="4">
    <default-route>
        <any-service> <parent/> <any-child/> </any-service>
    </default-route>

    <parent-provides>
        <service name="LOG"/>
        <service name="PD"/>
        <service name="CPU"/>
        <service name="ROM"/>
        <service name="IRQ"/>
		<service name="RM"/>
		<service name="IO_MEM"/>
        <service name="IO_PORT"/>
    </parent-provides>

    <default caps="200"/>


    <start name="timer" priority="0">
        
        <resource name="RAM" quantum="2M"/>
        <provides> <service name="Timer"/> </provides>
    </start>


    <start name="drivers" caps="1200" managing_system="yes" priority="0">
        
        <resource name="RAM" quantum="256M"/>
        <binary name="init"/>
        <route>
            <service name="ROM" label="config"> <parent label="drivers.config"/> </service>
            <service name="Timer"> <child name="timer"/> </service>
            <service name="Uplink"> <child name="nic_router"/> </service>
            <any-service> <parent/> </any-service>
        </route>
    </start>

    <start name="nic_router" caps="400" priority="0">
        
        <resource name="RAM" quantum="128M"/>
        <provides>
            <service name="Nic"/>
            <service name="Uplink"/>
        </provides>
        <config verbose_domain_state="yes">


            <policy label_prefix="monkey_concierge" domain="downlink"/>
            <policy label_prefix="drivers" domain="uplink"/>

            <domain name="uplink">

                <nat domain="downlink"
                     tcp-ports="16384"
                     udp-ports="16384"
                     icmp-ids="16384"/>

				<tcp-forward port="10000" domain="downlink" to="10.0.3.2"/>

            </domain>


			<domain name="downlink" interface="10.0.3.1/24">

				<dhcp-server ip_first="10.0.3.2" ip_last="10.0.3.2">
                    <dns-server ip="8.8.8.8"/>
                    <dns-server ip="1.1.1.1"/>
                </dhcp-server>

                <tcp dst="0.0.0.0/0"><permit-any domain="uplink" /></tcp>
                <udp dst="0.0.0.0/0"><permit-any domain="uplink" /></udp>
                <icmp dst="0.0.0.0/0" domain="uplink"/>

            </domain>

        </config>
    </start>



    <start name="monkey_concierge" caps="2000" priority="-1">
        
        <resource name="RAM" quantum="1G"/>

        <config>
            <vfs>
                <dir name="dev"> <log/> </dir>
                <dir name="socket"> <lwip dhcp="yes"/> </dir>
                <dir name="pipe"> <pipe/> </dir>
            </vfs>
            <libc stdout="/dev/log" socket="/socket" pipe="/pipe"/>

        </config>

        <route>
            <service name="Nic"> <child name="nic_router"/> </service>
            <any-service> <parent/> <any-child/> </any-service>
        </route>
    </start>


</config>

}


#
# Boot image
#


build_boot_image [build_artifacts]

append qemu_args " -nographic -smp 8 -m 12G -machine q35  -cpu Icelake-Server "
append_qemu_nic_args "hostfwd=tcp:127.0.0.1:5555-:10000"

run_genode_until forever

Thank you very much!

Ok, this means lwIP. Could you try running

make KERNEL=nova BOARD=pc run/netperf_lwip

on hardware, and thus, verify that the numbers are as bad as they seem on your system? The baseline should be somewhere around 900MBit/s. Thanks.

For nova I would suggest to let the timer alone on the priority 0, and all other components at least one priority below to avoid any timing anomalies under load. The network driver is, according to your configuration, the Linux ported one. lwip - lwIP - A Lightweight TCP/IP stack - Summary [Savannah] - stands for the used IP stack - so it is not a device driver. Beside lwip also lxip can be selected, which is the Linux IP port to Genode.