Proposed HRD File Format

This post is in response to the proposed new HRD file format, as introduced in Moving on from XML? A teaser for a possible alternative . (Speaking of which, should we have a new category for Genodians articles?)

This is a very interesting proposal. As usual, a great deal of thought has gone into this, resulting in a clean, minimal design (i.e. the “Genode Way”).

Personally, I count myself on the “love” side of the XML love/hate divide. In my day job, I have been using a minimal subset of XML (same as Genode uses) for all of my configuration data for at least 20 years, with no regrets. I’m not sure why people find it to be a barrier; I guess it’s just a matter of taste.

In any case, since there is a 1:1 correspondence with the XML, there really is no downside other than the effort to convert the existing XML files to the HRD format. Which brings me to a crazy idea…

Since there is indeed a 1:1 correspondence between XML and HRD, could you add a thin abstraction layer to the existing XML classes, to handle either format transparently to the callers?

The first non-whitespace character should be sufficient to auto-detect the format when reading. This would allow you to convert the files at a leisurely pace (or leave them in any combination), without having to change any existing component code (except when it is required to force one format or the other).

I don’t know if this idea is feasible or not, but it might make life a lot easier.

I also have a slightly silly syntax question: Do I infer correctly that consecutive pipes are ignored? For example, that the first two columns of pipes in the display drivers connector report example are merely for human organization, and not required for parsing?

Happy Sculpting!

4 Likes

Thank you John for the affirmative feedback.

Since there is indeed a 1:1 correspondence between XML and HRD, could you add a thin abstraction layer to the existing XML classes, to handle either format transparently to the callers?

Great to see your imagination! That’s not a crazy idea at all, but a sensible step of a migration path. I foresee that the 1:1 correspondence between XML and HRD will make the transition rather painless if it is supported by decent tooling and pursued in carefully chosen steps. The (intermediate) ability of Xml_node of accepting both formats would be such a step.

Depending on the feedback about the proposed syntax, I plan to publish two follow-up articles, one introducing the hrd tool I vaguely mentioned (for converting and manipulating HRD and XML), and one presenting further ideas for the migration.

Do I infer correctly that consecutive pipes are ignored? For example, that the first two columns of pipes in the display drivers connector report example are merely for human organization, and not required for parsing?

That’s perfectly right. It visually supports the human sense of scope and topology but is not needed by the machine. The use of those symbols will very much depend on the individual use case. Similarly, the tabular display of sub nodes is not mandated. When spending no though on the formatting, it still looks logical and rather clean. But when using the option to bring such sub nodes to the same line, a nice table appears, which aids human parsing.

2 Likes

Was planning to post on this (before going back to the mailing-list for the “year debrief” thread) but John beat me to it ^^

I’ll first post my understanding of the situation re XML configuration, before adding random thoughts.

In the past years the XML config was sometimes the object of discussions. Some focused on the XML syntax itself, and some on the semantic aspect it is used for, i.e. configuring the system. That latter one is one of the more hairy parts: the Genode framework fully empowers the system integrator to set “wires” and “pipes” to connect components with each others, to a degree that is unseen in other systems, I suspect. All OS’es strike a balance between hardcoding configuration in (C++ ?) code on one hand, or allowing the system designer (or even the end-user) to set the configuration in text files, like in the Linux /etcdirectory, on the other hand. Of the systems I am familiar with, most don’t allow for the outright removal of “lego bricks”, whereas Genode not only allows for it, but allows to do it by editing a textual config. One may ‘restack’ the storage stack, audio stack, net stack, etc, without writing one line of C++ code. One may remove (or add) new layers in the stack, re-organize how the “lego bricks” deal with each other, make the system as small and agile or as full-featured as desired, just by redesigning the XML config… It helps hugely for QA/unit-testing/debugging, and of course it gives Genode its unbelievable flexibility for running on various hardware and to perform various scenarios. That’s fanstastic to have.

But “with great power, comes great responsibility” :wink: … All complexity which is normally (in other operating systems) handled in code, is (in Genode) exposed in the XML config instead. The fact that it is textual does not remove the obligation to “get things right”, in order to obtain a working system.It just makes it cleaner, being cleanly separated from the code (kinda like the modern web separates contents and presentation, HTML and CSS stylesheets, I guess), but the complexity is still there… Some of it unavoidable – we don’t want to remove the ability to connect components together and various ways… And some of it (I believe) might benefit from further improvements and ambiguity reduction.

So over the years, I’ve been pondering that latter aspect, more than the XML syntax aspect. That is, a possible move away from XML would (as far as I’m concerned) be more a means to an end, not so much an end in itself. From Norman’s article, wIth the hrd command-line tool that is hinted at, I’d guess that configuration might become easier, or at least benefit from more infrastructure for someone looking to make it easier : that new hrd command might serve as a building block for (at least in my case) the developer/integrator to build upon, to create some sort of ‘assistant’ tool that deals not just with the syntax (don’t forget to close every open tag and chevron etc) but also with the semantics (a report_rom must be connected at both ends, don’t forget the policy, etc).


Now for the ‘remarks in no particular order’:

Reading this sample I was wondering why the children of vfs didn’t have the ‘pipe’ char leadig the ‘+’ char, whereas the children of dir did have pipes… But the above response addresses that: the pipe is optional:

config
 + vfs
   + tar bash-minimal.tar
   + tar coreutils-minimal.tar
   + tar vim-minimal.tar
   + dir dev
   | + zero
   | + null
   | + terminal
  • as I read the article, I find myself mentally translating the hdr samples to XML on-the-fly to understand them; it’s probably only natural since I’ve had just a few hours of exposure to them, once I get more familiar in the coming months I won’t have to worry so much about that.
  • agreed it would be important for the hdr class to have an as_xml() getter and for Xml_node to have from_hdr()setter. That part of the code can be discarded in a year or two after the migration is complete (or moved outside of the TCB, like the printf/sscanf-like code that was moved last year) to keep a “tight ship”.
  • good that white spaces are not semantically significant :wink:
  • don’t expect wiki (or others) to suddenly stop harping on the “Genode uses XML” theme, they’ll surely find other subjects to gripe on – if the Genode community migrates away from XML we should do it for our own sake and because it empowers us, not to please people who don’t use Genode :wink:
  • SH (syntax highlighting): as I work with various text formats these days (often Genode related), including some out-of-the-beaten-path ones with no bovious S.H. option, I tend to use tricks to syntax-highlight them, with some degree of success. For example, editing Genode .run files or my own Jamfiles, I switch between XML mode and something like Bash/Shell mode, depending on whether I’m reviewing/editing the XML-like part of the file or the one with # pound signs that denotate commented-out lines. Will I find a S.H. mode in my editor that matches the HRD format? It uses tokens I’ve never seen before, for commenting out a line etc, so the jury is still out. I’ll take a look, and maybe craft a new mode from scratch, if none of the existing ones match, we’ll see. (and maybe someone else will experiment and report “such and such S.H. mode works well for me”).
  • admittedly, part of the need for S.H. relates to the “noise” inherent to the XML syntax: once you remove that noise (chevrons etc), then there might not all that much stuff to highlight. But the “remed out line” part seems worth it to me, especially with the “x” character, which does not at all register instinctively as “comment out” in my mind. Another candidate would be “:” colon prefixed embeddings.
  • plus, “comment” lines have a different prefix from “commented-out” lines (quote versus ‘x’)
  • as always with text data, there’s an infinite number of ways a hand-crafted config can go wrong and have typos… I suspect a big part of the “make or break” result for us developers will be how graceful the error handling will be. If we end up with ‘breakage’ situations that are impossible to get out from, that’ll be a lot of friction! We want to avoid that… Even if it’s very hard to foresee all the different typo types that exist and how to deal with them…
  • I can already detect one way that HRD wil help me: in my Jamfiles (unlike in Genode’s .run files) I’m forced to escape XML quotes, due to Jam constraints. Since HRD makes little to no use of quotes, that’ll make things easier for me, at least in that respect.
  • Re. migration, I suppose one migration path would be 1) Genode Labs write all new config contents in HRD, then 2) starts converting old XML contents to HRD too. Assuming Genode can support both during the migration period.

So overall a careful :+1: from me as I trust the Genode team to take care of us as always ^^.

1 Like

When I read about idea of moving away from XML I have imagined what can be proposed as a replacement. Later I found an article on Genodians and was completely surprised, because the proposed direction is something totally different from what I envisioned. So I’ve been waiting with a response until I organized everything in my head to write a (hopefully valuable) opinion.

After this introduction it is probably not surprising that I’m not a fan of proposed idea and I’ll try to explain why.

I believe that we all hope that someday Genode and Sculpt become popular and will be broadly used by many people. This optimistic scenario would definitely make the amount of all configurations and data currently handled with XML format to be much bigger than it is currently. That brings me to conclusion that any changes in this base protocol for core components should be designed not with people in mind but for easy handling by applications. I think that all arguments raised by @nfeske are focused on problems with consuming information by people and I find them less relevant when discussing a proposal for a base data format of a broadly used system.

Given that from technical perspective I would consider following features as nice to have in an ideal format and implementation for handling configuration and data:

  • parsing should be fast and data should be small
  • tools should exist for automatic validation of structure
  • automatic parsing to “in memory structures” should be possible
  • should be supported in many different languages
  • it should be possible to present data in a human readable format
  • format of data should be versioned (it should be possible to detect old version, read it and convert it)

I know that those nice to have features are highly influenced by my experience with Protocol Buffers and I know there are some issues that would probably make it impossible to use it in Genode, but I personally would find such transition as a great move forward.

For me parsing functionality is not only an API for iterating over tree and getting textual attributes. It should also give guarantees about number of required elements and attributes, and about the values having proper types and within proper ranges, so when you get a parsed content you should not have to check for correctness of data.

On the other hand “one to one” transition from one syntax to another does not bring value to justify the change. It can be achieved for people using thin wrappers on top of log viewer and configuration editor. It would not be complicated to present all this data currently handled with XML as HRD, JSON or whatever other format preferred by specific user or developer.

I believe that current XML format is somewhat in the middle between some HRD and some full-blown framework that I would find ideal. And arguments that make it preferable in my eyes over HRD are:

  • it is maybe not liked very much, but it is known,
  • there are parsers available for any programming language,
  • there are validators that can check for structure (not only syntax),
  • implementation already exists and works.

In summary I would suggest to not do this switch (although I can learn this new syntax and get used to it if the decision will be to make this transition).

Here are some other random thoughts that did not match above where I did not want to diverse from main idea too much:

  • even logs on Linux and Windows are currently accessible using a tools and not directly from log files (journalctl), so using tools for presenting/editing internal information should not be an issue in a stable OS;
  • I don’t think that criticism of a completely new format would be smaller than criticism of XML, critique is always louder;
  • browsers (and other tools) can present XML currently and allow for collapsing and expanding of elements, syntax highlighting (mentioned earlier), etc., which will probably never happen for custom format.

You might be planning to cover this in the Genodians articles, but is there a reason not to leave the ability to read XML in the code going forward? I don’t feel strongly about it one way or the other, but it seems harmless.

TTCoder, you bring up several interesting points here:

I think your “assistant” tool is a great idea (regardless of XML/HRD). For some reason, my mind always goes to a visual designer for this sort of thing. I may be in a world of my own, but this seems like a good way, not only to visualize the entire structure, but to highlight the sort of things you mention. Just when I start to think it’s impossible, Leitzentrale’s component graph demonstrates that it isn’t. :wink: I’m very curious to hear what you, Norman, and others think about the form this “assistant” tool might take.

Syntax highlighting is another interesting topic. I don’t know what’s involved in writing plug-ins (or whatever they are) to handle this, but it’s worth looking into.

One thing I should have mentioned - I love the fact that disabled elements are denoted differently from comments! (“Commenting out” lines of code/config is an abomination that cannot die fast enough IMHO). As Norman said, it’s not only more convenient and conceptually cleaner, but eliminates scoping errors.

Very interesting discussion!

2 Likes

I’m not hell bent on it being ‘visual’, mind you ^^

It’s just that I’m too big an idiot to do things right and trouble-shoot config problems sometimes.

So I can see 2 ways out of this, 1) the core Genode code goes out of its way to issue warning()s for every single mistake that I (or even bigger fools :grin: ) can possibly make, guiding me to the correct config – that seems like an impossibly large task for the Genode team, just to accomodate a small handful of recalcitrant devs ; not to mention, this would enlarge the TCB with un-necessary code, which is a big no-no ; or 2) we move the problem out of the TCB, probably into the lap of a third-party developer, in the form of a textual or visual tool that helps me build the config. I’ve been thinking about that for a few years, and will probably keep thinking about that unproductively for a few more years, so please carry on with the (more productive) rest of this thread, I don’t want to interrupt :wink:

S.H. wise, of the two editors I use for coding, one I can’t find at the moment, and the other uses a simple yaml-based S.H. config I’m not familiar with, but it looks like it might work for my purpose.

Interesting point about disabling versus commenting, over the years doing C++ code I always thought it would be nice to be able to ‘grep’ for commented out code, without being overwhelmed with actual comments. The solution I came up with is to use ‘///’ triple slashes, so that goes to show you’re on to something, even with this old grumpy dev here ^^

1 Like

With my background on the Commodore Amiga and the 32-bit IFF format, the visual approach is more appealing than a text-based format of any sort. Though having to use a hex-editor to debug a format, it’s not much different than using a text-editor without syntax highlighting. Plus, once the new format is ready, the editor will be more visually appealing than any text editor could ever render, even with syntax-highlighting.

The best balance would be to have keywords as text and nesting levels of structures as a graphical collapsible tree gadget. Trying to make a text format humanly readable is not necessary but taking less space on disk and time to parse is preferable. If HRD is more compact than XML then that’s great but the humanly readable aspect of it doesn’t look so hopeful to me. What I hope for is that BSON would be to JSON what the new format would be to XML.

1 Like

I don’t have a strong opinion, but I’d like to add that I do think your new format is Much prettier, and the way of ‘commenting out’ blocks in one line seems very handy if you’re not using an IDE.

I’m glad to see schemas and linting called out as invaluable. If the new format didn’t have a way to provide that I’d say it was a downgrade no matter how much better the aesthetics.

1 Like

Thanks a lot for sharing your thoughts.

What you write about the robustness of writing manual configurations, I sense that you have probably left the existing tooling, in particular the offline validation against XML schemas using xmllint, not fully used so far. Quite a few components come with .xsd files like this that formally describe the configuration structure and arguments. The run tool uses those files to detect conflicts (not merely syntax but also bounds of values, attribute names etc.) at integration time, which is really useful.

Those file express some form of contract between the integrator and the component. They are already shipped in bin archives. I think it would be good to cultivate the automated use of such schemas (e.g., by Goa) further. It goes without saying that the HDR proposal is going to retain this concept.

Actually, they are because the indentation denotes the relation of sub nodes to their parent nodes.

SH (syntax highlighting) […]

As a base line, I generally appreciate syntax that does not require crutches in the first place, and I think that HRD achieves that (like I said, as a base line). That said, syntax highlighting would be nice, especially to make comments stand out at the first glance. I see the addition of a HRD mode to one’s editor of choice as a fun intellectual puzzle.

I can already detect one way that HRD wil help me: in my Jamfiles (unlike in Genode’s .run files) I’m forced to escape XML quotes […]

Same for embedding HDR in Tcl (i.e., run scripts). Today, using " Tcl strings for XML is a chore because one has to quote each single attribute value. So we mostly use { } blocks. Since HDR leaves " unused (except for rare corner cases where quoting cannot be avoided, namely specifying | or significant whitespace as attribute arguments), those collisions are no more.

So overall a careful :+1: from me as I trust the Genode team to take care of us as always ^^.

Thank you for your trust.

2 Likes

Thank you for having taken the time to express your concerns.

I think the following quote expresses well the misalignment of expectations.

My opinion is the polar opposite. The design of Genode is driven by the desire for a truly trustworthy OS. Trust requires predictability. Complex things are hard to predict. So in order to be trustworthy, Genode must be as simple and transparent as possible.

Having grown up in the world of binary formats (Atari home computers used binary formats everywhere), the predominant use of text throughout Unix was a revelation in terms of transparency. It’s universally good for a human forming trust in the operation of the system.

From a purely utilitarian point of view, syntax does not matter. The machine does not need syntax. From my perspective, however, syntax is the interface where the human meets the machine. It influences whether I personally enjoy using the machine or not. Clarity of syntax reinforces joy and trust.

From your posting speaks the concern that Genode may miss out on the masses if we take unorthodox decisions. You extrapolate Genode’s future in a certain direction (“broadly used by many people”). I don’t. I look at what Genode is today, what experiences we made during the past two decades, and thereby identified an opportunity to make the system simpler, more clear, and more pleasant to use.

Sometimes, an idea crosses one’s mind that solves a specific problem. I like those ideas even though they are easily forgettable. Very rarely, an idea crosses one’s mind that feels more like a discovery. Such a discovery occupies one’s mind, it feels so obvious, it does not let loose. HRD occupied my mind for two years before I wrote the article.

As I stated in the article, HRD is an exploratory undertaking. It’s not decided. The implementation of HRD parsers in different languages, tooling for automated conversion between HRD and XML, syntax highlighting for my favorite editor, the experimental use in Sculpt, or even a “Machine Readable Data” sister format may all be stepping stones along the path of this exploration. I posted the article to be transparent about the ambition and to collect different view points. Thank you for having shared your’s.

3 Likes

This approach has two unwelcome consequences:

First, it would increase the total (code) complexity compared to the status quo. Two parsers instead of one would double the likelihood for defects, and increase the effort needed for the continuous Q/A of the base framework. In contrast, with HRD I ultimately strive for reducing the total complexity. To reinforce the latter point, HRD could even replace the Arg_string syntax as used for session arguments. So we could consolidate even further.

Second, it would promote the proliferation of a mix of two notations, sacrificing the coherency of configurations that we enjoy today.

I would address the long-term interoperability with XML in the form of optional code that leaves the base framework untainted. E.g.,

  • A special version of the report-ROM server could offer the transparent translation between HRD and XML between the producers and consumers of reports.
  • A VFS plugin could be used as a translator in situations where an application expects XML obtained from a file, e.g., using a 3rd-party library.
  • It goes without saying that we will retain the Xml_node and Xml_generator utilities to accommodate use cases where the use of XML is a deliberate decision. Similar to how we still support format strings by Genode, but outside the base framework.
2 Likes

Norman, this is not an attempt to convince you to my point of view on this topic, but when reading response I feel I wasn’t properly understood. That’s why I’ll try to clarify a little my position.

Norman, you have positioned my proposal as complex vs. your simple. I don’t agree with this categorization.
I believe that parsing is never simple. It always consists of:

  • retrieval of data
  • tokenizing to some structures (we’re discussing here only tree like structures with attributes)
  • validation (verification of existence of required structures/attributes, of not existing invalid ones, type checking, type conversion, bounds checking, etc.)
  • processing of parsed data

and I think that third point is the most complex one to implement properly. As far as I know current xml parser does not do it at all (it leaves this step to application) - that is why it can be so simple. My understanding of proposed HRD is that it would end up with similar functionality. I wanted to express that I would see a value in having this point implemented in a parser/parsing framework, given that changes are considered in this part of the system. Otherwise each implementation in each programming language will have to have this validation duplicated, finally potentially making it more complex and not simpler.

I believe we are differently splitting using and development of the system. I believe that for using of the system this syntax thing is not relevant at all. As a user I should not see it ever. I should have tools that support me with using the system. From this we get to another conclusion you stated, that I did not formulate:

I did not want to suggest shortcuts for Genode (or take unorthodox decisions). I really value Genode team’s strictness in taking best decisions in design.

Suggestion for binary format in the long future comes from expectation that balance between amount of data processed by people by inspecting it vs. data processed with code will change. As a (maybe poor argument) I’ll note that no one would propose textual representation for e.g. audio data, because the ratio between amount of data that has to be inspected by people vs. overall processed data is very low.
Content of reports is also just data. It is produced by applications and consumed by applications. The main differences are amount of this data and the ratio between amount of data that is inspected by human vs all data that is processed by applications. Potential increase of use by masses will influence this ratio and will put more pressure on efficiency and lesser the need of inspectability as text (of course tools for presenting it conveniently for people should exist).

I must admit that when considering configuration of applications the argument of amount of processed data seems less relevant. When I wrote my previous message I was mostly focused on processing of reports.

I really value your ideas and the amount of considerations you put into them. And I can add that I find reading HRD data very pleasant (not sure about editing because I did not try).

What is your favorite editor? vim? - asking in case I will need to like it too, to have syntax highlighting :slight_smile:

I hope that with this response I have clarified parts which were probably formulated in an unclear way in my previous message.

As I’ve followed this discussion I felt that it was missing a perspective that at first I wasn’t able to express clearly:

In my opinion the discussion around users vs. developers misses an important demographic:
Prospective integrators that would build systems based on Genode.

A power user hopefully doesn’t need to edit configuration files every day and when they do, they might get used to HRD as the configuration format for Genode. It’s the system that they are using after all.
An end user of a Genode-based appliance might not need to edit any configuration files by hand at all.

As a Genode developer, I have opinions on the aesthetics of HRD vs. XML, I might like being able to comment out a line with one character and dislike other aspects of HRD’s syntax and have a rather dispassionate relationship with XML like anybody, but as someone working on the system, I’ll get used to it either way.

Both roles have in common that they are immersed in the Genode OS Framework. However, I imagine that for a prospective system integrator, having to configure Genode OS in a custom configuration format that isn’t used anywhere else, for which there isn’t any tooling for programming languages and IDEs other than what is provided by Genode, and that has to be memorized for the sole purpose of configuring the Genode OS Framework, might be a major downside.

This isn’t meant to argue against HRD’s qualities as a format proposal. What I want to emphasize is the cost introduced by the lack of interoperability in a custom format. I believe there is a difference between using any established format (as long as it’s not yaml :wink: ) that might just be intuitively recognized by someone following the hello world tutorial and adding a custom configuration format on top of the complexity that is inherent in building an OS framework.

Hi everyone,

We want to share Gapfruit’s perspective regarding the proposal to replace XML with HRD. Gapfruit provides a product that enables other companies to develop and deploy their applications to a resilient platform, keeping the total cost of ownership to a bare minimum. We aim to make it as straightforward as possible for external developers to port their existing applications using Goa and for managed solution providers to configure and maintain the system through a cloud-based interface.

While we are intrigued by the idea of replacing XML in principle, we are concerned about adopting HRD as the alternative. HRD is a format unfamiliar to developers outside the Genode ecosystem, and its tooling will exist solely within the Genode community for the foreseeable future. Introducing a non-standard, niche configuration language creates an unnecessary barrier for our partners and customers, who already face challenges adapting to a new operating system.

We believe it is crucial to adopt formats and tools that are widely recognized and supported in the broader development ecosystem. Striving for standard tools and languages where possible minimizes friction for developers and encourages wider adoption of the platform.

Norman’s excellent article points out XML’s benefits, which are also essential to us. Weighing the different tradeoffs, sticking with XML is the best way forward for Gapfruit. If a change is necessary, we’d prefer a common standard.

It is hard to go against the two eloquent posts just made. Nonetheless I feel that HRD has value precisely because it is not beholden to previous conventions. As an armchair follower and tinkerer of Genode, the “file tree” appearance is much more intuitive to me than XML. One may ask that if such an innovation is inappropriate in a young and still malleable OS like Genode, in precisely what instance would it be appropriate?

Whilst there may prove to be be insurmountable technical arguments against HRD, I hope that Genode team (and user community) have the courage to accept short term pain for long term gain. As Genode matures, the opportunity to do so may not come again.

The Genodians article that introduces HRD notes that Wikipedia cites XML as a criticism of Genode. Whilst the subject for another thread, I note the biggest criticism is the choice of programming language (C++). Memory safe language has almost become mandatory amongst candidates for future OS, regardless of technical merits, and Genode may be harmed by this perception. Perhaps a successful replacement of XML will position Genode very well to take on this much bigger challenge in the future?

1 Like

A random (not very well fleshed out) idea just popped up in my head…

Could XML support be preserved as a “community project” ?

To use a (possibly not so relevant?) comparison, consider my own need for xattr (extended attribute) support. Genode’s libc and Fs-client session do not provide xattr support, out of the box. -But- Genode provides a ‘hook’ for me to connect into libc, which I can make accessible with a ten-line patch, which (giving me access to internal libc structures and “dot filepath” construction) allows me to augment Genode with my (“community provided”) xattr support and shazam, my Genode builds have xattr support. Maybe the same can be done here, i.e. once Genode retires XML support after the transitional period, that support can be moved to genode-world or some such, leveraging some newly added shallow “hooks” in Genode if that’d help? Again, I didn’t think this through so it could be there’s a show-stopper preventing that altogether.

2 Likes