Hello Genode community,
I’m seeking guidance on an unexpectedly low network performance issue encountered while running Genode in a virtualized environment. My current observations show Genode achieving only about one-tenth (1/10) of the network throughput compared to Linux under identical setup conditions.
I would appreciate any insights on potential bottlenecks or best practices for high-speed networking within Genode, as my goal is to understand what is limiting the performance.
Test Setup and Results
I conducted the following experiments on a bare-metal x86 server with 32 CPU cores and 64GB of RAM. The virtualization platform used was QEMU (x86_64) without KVM acceleration.
For all guests, I allocated 8 CPU cores and 16GB of RAM. The test involves a simple, large-block network copy program (transferring a buffer of like 128MB).
1. Cross-VM Network Test (Client → Server)
I ran two separate VMs side-by-side:
| Setup | Throughput Result | Comparison |
|---|---|---|
| Linux (archiso) | 71077 KB/s | (Baseline) |
| Genode | 3750 KB/s | 5.3% of Linux’s speed |
The performance difference between the two operating systems is significant.
2. Single-VM Localhost Test (Intra-VM Networking)
To isolate the issue from cross-VM network emulation overhead, I ran both the client and server components within a single Genode VM (localhost communication, similar to ip_raw.inc).
| Setup | Throughput Result |
|---|---|
| Single Genode VM (Localhost) | 6547 KB/s |
This result, while better than the cross-VM test, is still surprisingly low for an internal loopback transfer of 128MB.
Code Implementation Details
The core transfer mechanism uses the following structure. net::Socket4 is a minimal wrapper around standard libc socket functions (read/write).
The large transfer is static const long TEST_SIZE = 128 * 1024L * 1024;
My client code:
static const long TEST_SIZE = 128 * 1024L * 1024;
static const long BUF_SIZE = TEST_SIZE;
static char buf[BUF_SIZE];
void localhostNetCli(Libc::Env& env) {
Genode::Heap heap { env.ram(), env.rm() };
adl::defaultAllocator.init({
.alloc = [] (adl::size_t size, void* data) {
return reinterpret_cast<Genode::Heap*>(data)->alloc(size);
},
.free = [] (void* addr, adl::size_t size, void* data) {
reinterpret_cast<Genode::Heap*>(data)->free(addr, size);
},
.data = &heap
});
for (long i = 0; i < BUF_SIZE; i++) {
buf[i] = char(i % 256);
}
Timer::Connection timer {env};
net::Socket4 sock;
sock.ip.set("10.0.3.3");
sock.port = 12000;
// wait..
Timer::Connection sleepTimer {env};
sleepTimer.msleep(6000);
if (sock.connect() != monkey::Status::SUCCESS) {
MONKEY_LOG_ERROR("Socket connect error (10.0.3.3).");
sock.ip.set("10.0.3.2");
if (sock.connect() != monkey::Status::SUCCESS) {
MONKEY_LOG_ERROR("Socket connect error (10.0.3.2).");
MONKEY_LOG_ERROR("------ CRITICAL ------");
return;
}
}
auto beg = timer.curr_time();
sock.send(buf, TEST_SIZE);
char recvBuf[20];
sock.recv(recvBuf, 2);
auto end = timer.curr_time();
auto durationMSecs = end.trunc_to_plain_ms().value - beg.trunc_to_plain_ms().value;
MONKEY_LOG_INFO("rate: ", (BUF_SIZE) * 1000L / (1024) / durationMSecs, " KB/s");
}
void Libc::Component::construct(Libc::Env& env) {
Libc::with_libc([&] () {
localhostNetCli(env);
});
return;
}
My server code:
static const long TEST_SIZE = 128 * 1024L * 1024;
static const long BUF_SIZE = TEST_SIZE;
static char buf[BUF_SIZE];
void localhostNetSvr(Libc::Env& env) {
Genode::Heap heap { env.ram(), env.rm() };
adl::defaultAllocator.init({
.alloc = [] (adl::size_t size, void* data) {
return reinterpret_cast<Genode::Heap*>(data)->alloc(size);
},
.free = [] (void* addr, adl::size_t size, void* data) {
reinterpret_cast<Genode::Heap*>(data)->free(addr, size);
},
.data = &heap
});
net::Socket4 listenSocket;
listenSocket.ip.set("0.0.0.0");
listenSocket.port = 12000;
listenSocket.start();
net::Socket4 client = listenSocket.accept();
Genode::warning(__func__, " accepted client.");
Timer::Connection timer {env};
auto beg = timer.curr_time();
client.recv(buf, TEST_SIZE);
auto end = timer.curr_time();
client.send("OK", 2);
client.close();
auto durationMSecs = end.trunc_to_plain_ms().value - beg.trunc_to_plain_ms().value;
MONKEY_LOG_INFO("rate: ", (TEST_SIZE) * 1000L / (1024) / durationMSecs, " KB/s");
}
void Libc::Component::construct(Libc::Env& env) {
Libc::with_libc([&] () {
localhostNetSvr(env);
});
return;
}
net::Socket4 is just a simple wrapper of libc socket, like:
adl::int64_t PromisedSocketIo::recv(void* buf, adl::size_t len) {
adl::size_t sum = 0;
while (sum < len) {
adl::int64_t curr = read(socketFd, ((char*) buf) + sum, len - sum);
if (curr <= 0) {
return -1;
}
sum += size_t(curr);
}
return adl::int64_t(sum);
}
adl::int64_t PromisedSocketIo::send(const void* buf, adl::size_t len) {
return ::write(socketFd, buf, len);
}
Question
Given that the issue persists even in the localhost test, it suggests the bottleneck might be within the Genode’s internal mechanism:
-
The IPC overhead required for network stack (lwIP) or socket communication.
-
The network stack implementation/efficiency itself.
-
The overhead associated with large
read/writecalls in the VFS/libc layers. -
The performance impact of QEMU TCG emulation on microkernel interactions.
I need to profile and identify the specific bottleneck. Could anyone advise on the most likely areas for network performance bottlenecks in Genode, especially under TCG emulation, and how to effectively profile/debug them?
Thank you for your assistance.