I’m a little confused by this, because wouldn’t this only necessarily be the case if the protocol used for networking is synchronous?
I’m sorry that I jumped to conclusions too hastily. When I looked at the code, a ping-pong-scenario (like TFTP) sprang to my mind. But in your post you mentioned the use of a traffic generator. Thank you for giving the idea a try anyway and dispelling my assumption.
[…] the NIC might monopolise CPU time […]
For analyzing the CPU utilization, let me recommend the trace-logger utility that you can find at repos/os/src/app/trace_logger/. It uses the TRACE service of Genode’s core component to obtain the consumed CPU cycles for each active thread over time. To give it a try, you may modify the lighttpd_fetchurl.run script as follows:
diff --git a/repos/ports/run/lighttpd_fetchurl.run b/repos/ports/run/lighttpd_fetchurl.run
index bd403cbc39f..108b1e6962c 100644
--- a/repos/ports/run/lighttpd_fetchurl.run
+++ b/repos/ports/run/lighttpd_fetchurl.run
@@ -10,8 +10,8 @@ if {[have_board virt_qemu_riscv]} {
}
proc ip_stack { } { return lwip }
-proc download_size { } { return "1M" }
-proc audit { } { return 1 }
+proc download_size { } { return "1000M" }
+proc audit { } { return 0 }
build "lib/vfs_[ip_stack] lib/libc app/fetchurl lib/curl"
@@ -32,6 +32,7 @@ import_from_depot [depot_user]/src/[base_src] \
[depot_user]/src/vfs_[ip_stack] \
[depot_user]/src/vfs_audit \
[depot_user]/src/vfs_pipe \
+ [depot_user]/pkg/trace_logger \
[depot_user]/src/zlib
proc lighttpd_config { } {
@@ -59,6 +60,7 @@ install_config {
<service name="IRQ"/>
<service name="IO_PORT"/>
<service name="IO_MEM"/>
+ <service name="TRACE"/>
</parent-provides>
<default-route>
<any-service> <parent/> <any-child/> </any-service>
@@ -147,6 +149,23 @@ install_config {
</config>
</start>
+ <start name="timer">
+ <resource name="RAM" quantum="1M"/>
+ <provides><service name="Timer"/></provides>
+ </start>
+
+ <start name="trace_logger" caps="200">
+ <resource name="RAM" quantum="16M"/>
+ <config verbose="yes" session_ram="9M" session_arg_buffer="1M"
+ period_sec="5" default_policy="null" default_buffer="4K">
+ <default-policy/>
+ </config>
+ <route>
+ <service name="TRACE"> <parent label=""/> </service>
+ <any-service> <parent/> <any-child/> </any-service>
+ </route>
+ </start>
+
</config>}
build_boot_image [build_artifacts]
The patch
- removes the logging of VFS interactions (audit),
- bumps the download size to 1000M,
- adds the trace_logger to the scenario by adding its pkg to the
import_from_depot
function,
- starts the
timer
service (needed by the trace_logger for the periodic reporting), and
- starts the
trace_logger
with the policy null
to omit the tracing of RPCs but still reporting the CPU cycles consumed by each thread at a rate of 5 seconds.
When I start the scenario on the PinePhone, I get a table as follows:
[init -> trace_logger]
[init -> trace_logger] Report 12
[init -> trace_logger]
[init -> trace_logger] PD "kernel" -------------------------------------------------------------
[init -> trace_logger] Thread "idle" at (0,0) ATTACHED total:158253919 recent:0
[init -> trace_logger] Thread "idle" at (1,0) ATTACHED total:0 recent:0
[init -> trace_logger] Thread "idle" at (2,0) ATTACHED total:0 recent:0
[init -> trace_logger] Thread "idle" at (3,0) ATTACHED total:0 recent:0
[init -> trace_logger]
[init -> trace_logger] PD "init" ---------------------------------------------------------------
[init -> trace_logger] Thread "init" at (0,0) ATTACHED total:148584 recent:0
[init -> trace_logger] Thread "ep" at (0,0) TRACED total:2297321 recent:0
[init -> trace_logger]
[init -> trace_logger] PD "init -> nic_router" -------------------------------------------------
[init -> trace_logger] Thread "root" at (0,0) ATTACHED total:44299 recent:0
[init -> trace_logger] Thread "signal_proxy" at (0,0) ATTACHED total:3308 recent:0
[init -> trace_logger] Thread "nic_router" at (0,0) TRACED total:7837729 recent:650534
[init -> trace_logger] Thread "ep" at (0,0) TRACED total:185400974 recent:15709180
[init -> trace_logger]
[init -> trace_logger] PD "init -> lighttpd" ---------------------------------------------------
[init -> trace_logger] Thread "lighttpd" at (0,0) ATTACHED total:60677 recent:0
[init -> trace_logger] Thread "ep" at (0,0) TRACED total:673275788 recent:57378535
[init -> trace_logger]
[init -> trace_logger] PD "init -> fetchurl" ---------------------------------------------------
[init -> trace_logger] Thread "fetchurl" at (0,0) ATTACHED total:60570 recent:0
[init -> trace_logger] Thread "ep" at (0,0) TRACED total:392392278 recent:33509079
[init -> trace_logger]
[init -> trace_logger] PD "init -> timer" ------------------------------------------------------
[init -> trace_logger] Thread "timer" at (0,0) TRACED total:406402 recent:28967
[init -> trace_logger] Thread "ep" at (0,0) TRACED total:25855050 recent:2173874
[init -> trace_logger] Thread "root" at (0,0) TRACED total:145896 recent:0
[init -> trace_logger] Thread "signal_proxy" at (0,0) TRACED total:9023 recent:0
[init -> trace_logger]
[init -> trace_logger] PD "init -> trace_logger" -----------------------------------------------
[init -> trace_logger] Thread "trace_logger" at (0,0) TRACED total:125072 recent:4238
[init -> trace_logger] Thread "ep" at (0,0) TRACED total:19370723 recent:1962548
[init -> trace_logger]
[init -> trace_logger] PD "init -> report_rom" -------------------------------------------------
[init -> trace_logger] Thread "report_rom" at (0,0) ATTACHED total:62137 recent:0
[init -> trace_logger] Thread "ep" at (0,0) TRACED total:7606472 recent:598562
[init -> trace_logger] Thread "root" at (0,0) TRACED total:23612 recent:0
[init -> trace_logger] Thread "signal_proxy" at (0,0) TRACED total:3465 recent:0
[init -> trace_logger]
It tells me that the CPU has not been idle during the most recent 5 seconds. The scenario is CPU-bounded as expected. The dominating CPU consumers are lighttpd (57M ticks), fetchurl (34M ticks), NIC router (16M ticks), and trace logger (2M ticks).
Another thing to consider might be DMA and copying.
That’s right. For investigating this hypothesis, another tool becomes handy, namely the GENODE_LOG_TSC
and GENODE_TRACE_TSC
macros of base/log.h. Let’s say we want to know how much time the TCP/IP stacks spends with copying data. A grep for memcpy
within libports/src/lib/vfs/lwip brings nic_netif.h to my attention (sys_arch.cc might also be of interest). The following modification introduces a scope for the code portion I want to measure and puts the GENODE_TRACE_TSC
macro at the beginning of this scope:
diff --git a/repos/libports/src/lib/vfs/lwip/nic_netif.h b/repos/libports/src/lib/vfs/lwip/nic_netif.h
index 9a5433627c1..cf706a93dbe 100644
--- a/repos/libports/src/lib/vfs/lwip/nic_netif.h
+++ b/repos/libports/src/lib/vfs/lwip/nic_netif.h
@@ -400,12 +400,16 @@ class Lwip::Nic_netif
* We iterate over the pbuf chain until we have read the entire
* pbuf into the packet.
*/
+ {
+ GENODE_TRACE_TSC(5000);
+
char *dst = tx.packet_content(packet);
for (struct pbuf *q = p; q != nullptr; q = q->next) {
char const *src = (char const *)q->payload;
Genode::memcpy(dst, src, q->len);
dst += q->len;
}
+ }
tx.try_submit_packet(packet);
_wakeup_scheduler.schedule_nic_server_wakeup();
According to the GENODE_TRACE_TSC
argument, the time stamp counter is logged to the trace buffer after every 5000 times the code is executed. There is also a GENODE_LOG_TSC
variant printing messages to the log, but this is much more heavy-weight than tracing. With this change, the output of the trace logger is supplemented with messages like this:
[init -> trace_logger] PD "init -> lighttpd" ---------------------------------------------------
[init -> trace_logger] Thread "lighttpd" at (0,0) ATTACHED total:60713 recent:0
[init -> trace_logger] Thread "ep" at (0,0) TRACED total:440991386 recent:57509127
[init -> trace_logger] 28867784731: TSC linkoutput: 1172M (295000 calls, last 4076)
[init -> trace_logger] 29411984170: TSC linkoutput: 1191M (300000 calls, last 3684)
[init -> trace_logger] 29879209261: TSC linkoutput: 1211M (305000 calls, last 3748)
[init -> trace_logger] 30325553506: TSC linkoutput: 1229M (310000 calls, last 3604)
[init -> trace_logger] 30761963089: TSC linkoutput: 1249M (315000 calls, last 3587)
[init -> trace_logger] 31214243710: TSC linkoutput: 1269M (320000 calls, last 6169)
[init -> trace_logger] 31641200854: TSC linkoutput: 1288M (325000 calls, last 3606)
[init -> trace_logger] 32175499318: TSC linkoutput: 1307M (330000 calls, last 3816)
[init -> trace_logger]
[init -> trace_logger] PD "init -> fetchurl" ---------------------------------------------------
[init -> trace_logger] Thread "fetchurl" at (0,0) ATTACHED total:60829 recent:0
[init -> trace_logger] Thread "ep" at (0,0) TRACED total:250211658 recent:32007396
[init -> trace_logger] 29764990276: TSC linkoutput: 90220K (160000 calls, last 501)
[init -> trace_logger] 30602302084: TSC linkoutput: 92853K (165000 calls, last 475)
[init -> trace_logger] 31447420543: TSC linkoutput: 95493K (170000 calls, last 482)
[init -> trace_logger] 32408192512: TSC linkoutput: 98128K (175000 calls, last 482)
[init -> trace_logger]
Note that TSC values must be handled with caution. Should a thread be preempted by the kernel, the TSC value would jump. However, in practice, in particular when inter-thread dependencies dominate the context switching, the TSC values can be correlated with the CPU cycles spent.
We can see that within one period (5 seconds) lighttpd spends about 200M cycles in this particular memory-copy operation. In relation to the 5G cycles (assuming that TSC values grow 1G per second, depends on the CPU) recently spent, it is not dominating. The major costs must come from somewhere else. For the client side (fetchurl), the costs are negligible. This is expected because the bulk data transfer happens from server to client.
The further performance analysis comes down to finding good spots for placing GENODE_TRACE_TSC
measurement points.
By tweaking the argument of GENODE_TRACE_TSC
you can get a feel how often a given code path is executed. E.g., the following instrumentation within the libc tells me how often the libc-using application wakes up the remote side (the NIC router).
diff --git a/repos/libports/src/lib/libc/internal/kernel.h b/repos/libports/src/lib/libc/internal/kernel.h
index 12617364958..255f7500289 100644
--- a/repos/libports/src/lib/libc/internal/kernel.h
+++ b/repos/libports/src/lib/libc/internal/kernel.h
@@ -726,6 +726,7 @@ struct Libc::Kernel final : Vfs::Read_ready_response_handler,
void wakeup_remote_peers()
{
_libc_env.vfs_env().io().commit();
+ GENODE_LOG_TSC(1000);
}
};
I tweaked the value until I got messages in a rhythm of approx. 1 second. So I can infer that the code is executed 1000 times per second.
Hope that you will find these instruments helpful for your further analysis.