I’ve been testing Genode on two systems with different processors (4th-gen core i7 and Celeron). While I expect the Celeron to be slower, the performance gap with Genode is far larger than anticipated, especially compared to Linux on the same Celeron hardware.
I suspect that the microkernel architecture, with its IPC and task-switching overhead, is causing these bottlenecks. I’ve already tried with the priority attribute of components, CPU core affinity, and explicit processing unit allocation as per the documentation, but observed no improvement. I’m seeking further guidance on tools or methods to identify and reduce these bottlenecks.
I’ve been testing Genode on two systems with different processors (4th-gen core i7 and Celeron). While I expect the Celeron to be slower, the performance gap with Genode is far larger than anticipated, especially compared to Linux on the same Celeron hardware.
I suspect that the microkernel architecture, with its IPC and task-switching overhead, is causing these bottlenecks. I’ve already tried with the priority attribute of components, CPU core affinity, and explicit processing unit allocation as per the documentation, but observed no improvement. I’m seeking further guidance on tools or methods to identify and reduce these bottlenecks.
It seems you raise the same concerns already mentioned in Performance Issues with Sculpt OS and Falkon on Celeron. Without knowing your exact scenario it is hard to tell where your bottlenecks actually lie but I would not point at the underlying system architecture prematurely.
One big difference that will hit you is that the pixel pushing is done only on the CPU in Genode at the moment. So you profit from a faster CPU and a slow CPU will worsen your interactive experience when it comes to compositing and blitting pixels as time spent there is missing for other tasks. In the same vein using a display with a high resolution will make that apparent¹.
One experiment you could try would be to configure your Linux appliance in a way that the GPU is not utilized and your graphics output is not accelerated to level the playing field when drawing comparisons.
Now, if you want to investigate where the time is spent on Genode I suggest to take a look at the top component that prints the the execution time of each component periodically to the LOG.
¹) Every now and then I test Sculpt on a Shuttle DS57U (Celeron 3205U) that has 2 cores clocked at 1.5 GHz and driving a smaller 1080p display works well enough albeit one core is heavy utilized when nit_fader or a full-screen component like a game is active. Using my normal display with 2560x2880 will push the system over the edge where nit_fader exhibits noticeable artifacts and since the system only has 2 cores spreading the load across more cores is unfortunately not possible.
Since your experience varies so much between different machines, I wonder, have you scrutinized the BIOS settings of your Celeron? It’s just a shot in the dark but should the BIOS be configured for preserving energy rather than maximum performance, this setting would effect Genode much stronger than Linux, which manages pstates automatically by default.
As a general remark, performance is mostly a function of effort than impeded by the often cited “microkernel overhead”. Over the course of the past 15 years, our team at Genode Labs has conducted plenty of projects that entailed specific and measurable performance goals (e.g., achieving 90% disk throughput compared to Linux). We never missed such goals. But it took effort to find the bottlenecks and to design ways to overcome them. Often the culprit lied in the choice of data structures (like allocators), mere configuration (like using sensible I/O buffer sizes), or global system effects (like setting up the right CPU frequency or using appropriate caching attributes).
If your inquiry has a commercial background, you may consider contracting Genode Labs for conducting such analysis and optimization work.
I saw the gpu_drv module in the Sculpt OS setup, so I figured it supported the GPUs listed in the config section. But after looking into it more, it seems like the driver is there, but it might not actually be working yet. Am I getting that right?
Also, based on your suggestion, I tried running Linux with the GPU turned off. The resolution dropped, but it didn’t really affect performance that much. So I thought lowering the resolution in my Genode code would also help a lot, but it didn’t make a noticeable difference in practice.
I checked the system BIOS and set the CPU to maximum performance, but it didn’t make a difference.
One of Genode’s key advantages over other frameworks—and of the NOVA microkernel over most other microkernels—is its high efficiency and responsiveness, even on low-power processors. But it looks like I’m hitting a bottleneck that might be slowing down some of Genode’s graphics handling. This could be due to the client-server setup of nitpicker, the built-in overhead from qt5, the way webview handles HTML5 rendering, or even the way CPU cycles are allocated. Perhaps CPU quantum could be managed more effectively through modules in the XML structure.
[…] the built-in overhead from qt5, the way webview handles HTML5 rendering […]
Since this part is not only the least understood (by us developers) but also the most complex one by a wide margin, my gut feeling intuitively assumes the bottleneck there. To replace vague feeling by facts, let me give you a few potential starting points for investigating.
It is crucial to find out whether the workload is CPU bounded or I/O starved. Assuming the former, it is of most interest to know which component consumes the biggest share of CPU in the given scenario. To answer these questions, I recommend giving the following two tools a try, which are readily available on Sculpt OS: First, Alex’ interactive top tool gives you a momentary view on the CPU utilization by the various components and individual threads. Should you find the nitpicker GUI server up in the list, you know that my gut feeling was wrong. Should you see much idle time on all CPUs, the scenario is most likely starved on I/O for some reason. If you spot two closely related threads consuming much CPU, those threads may be overly chatty (why could that be?). Second, the trace logger as found in the Options menu of Sculpt gives you even a record of the CPU cycles for each component accumulated over time. This is good in situations where the CPU load fluctuates a lot, which would render the mere momentary view of the top tool rather limiting. With trace logger, you see the total CPU time spent.
You may leverage the fact that Genode executables can be started on Linux. So you can cross-correlate the behavior of a given component between Genode/NOVA and Genode/Linux. In cases like your’s where Falkon is available on native Linux and under Genode, you can furthermore compare the behavior between Falkon/Genode/Linux and Falkon/Linux. Now when using Linux as kernel, you can utilize all tooling (e.g., oprofile) available on Linux to investigate anomalies. As a precondition for this approach, one needs to come up with a predictable and stable scenario (e.g., largely removing side effects like network I/O) in the form of a run script that is readily executable on Genode/NOVA and Genode/Linux.
Once you suspect a certain code path being the bottleneck (e.g., pixel-pushing graphics code), you can follow your suspicions by adding manual instrumentation using Genode’s handy TSC measuring utilities.
I saw the gpu_drv module in the Sculpt OS setup, so I figured it supported the GPUs listed in the config section. But after looking into it more, it seems like the driver is there, but it might not actually be working yet. Am I getting that right?
The driver is there and known to work on at least the devices given in its default config (this is pretty much the list of devices tested with the driver). However, it is only used on demand by specific components, like the morph-browser, vbox6 (needs to be enabled in the machine.vbox file) and one or the other game, but not in general.
Also, based on your suggestion, I tried running Linux with the GPU turned off. The resolution dropped, but it didn’t really affect performance that much. So I thought lowering the resolution in my Genode code would also help a lot, but it didn’t make a noticeable difference in practice.
Well, I mostly focused on the effects you mentioned and I can observe on a low-spec’ed test machine where using a GPU would make a difference. That being said, as @nfeske already addressed, this scenario is a particularly complex one and following his pointers is certainly the first step in figuring out what’s going on.