Support for international input methods

I have decided to write a short piece here for those who are interested, and I hope to attract the attention of the developers.

Once, in the old days, the “characters” used in computers were based on 8-bit encoding, which allows 128 characters. That is sufficient for the latin alphabet, numerical characters, and various others.

From the very beginning, there were problems with this 1-byte encoding: even in most European languages there are extra characters, and the default 1-byte encoding was not enough to simply support all these characters at the same time. Various “local” variations came into use.

The problem was further compounded by Chinese, Japanese and Korean, the so-called CJK languages. Chinese characters are built-up of various building blocks (in Japanese these are called “radicals”); a “character” consists of a combination of one or more radicals. Due to this “modular” nature, there are tens of thousands of characters.

Skipping some of the historical developments, about 40 years ago the situation was as follows:

  • Japanese: there is a list of “standard” characters (approx. 1950 characters); “the media” are requested to limit their use to this list. In practice, the situation is more complicated: there are family names and place names which are (traditionally) written with other characters. Thus, in practice there is the Japanese Industrial Standard (JIS) list of about 6400 characters.
  • Korean is written in Hangul, which are compound characters, built of one, two, or three phonetic parts. There are more than 11,000 possoble combinations. Besides this, many people in Korea also use Chinese characters to write family names, etc.
  • In mainland China the communist government has mandated the use of an official list of about 2000 “simplified” characters
  • But in places like Taiwan, Hong Kong and Macao the traditional Chinese characters are still in use

The situation is even more complicated in practice: suppose you want to edit a critical edition of a 12th century Buddhist text - until very recently, such work required the creation of custom fonts and charcter encodings.

It is clear that at least 16 bits are needed to encode CJK languages. In the UNIX community work started early, leading to EUC - Extended UNIX Encoding (EUC-JP, EUC-KR and various flavours for Chinese). However, the needs of the UNIX community were not the same as the needs for “daily work” - consider electronic cash registers, signage at railway stations, displays on electronic devices, cell phones, etc. In the 1980s two more encodings appeared: JIS and Shift-JIS. To make the situation even more interesting, Microsoft opted for yet another encoding, Code page 932.

The above is limited to output. For input the situation is more complicated even. Consider a news paper in 19th century Japan: printing is not so much of a problem (you can make characters in movable type), but writing is by hand. The question is how to make an input system for Japanese (or Korean, or Chinese).

The problem of input was really only solved in the 1980s with so called “input method modifiers”: since it is hardly practical to make keyboars with thousands of keys, methods evolved to type “phonetically” on “tradiational” keyboards. An “interim layer” then captures the input from the keyboard, and forwards this to another software which modifies the keyboard input into the desired form: Japanese, or Korean, or Chinese, or Hindi, or …

Apple software included support for Japanese from a very early stage, leading to a large market share of Apple computers in Japan.In Windows the support for Japanese was rather poor, leading to third-party plugins such as (famous) ATOK (ASCII to Kanji). Linux offered support for Japanese (based on EUC) from the earliest phase.

I am surprised that in 2026 GENODE does not support Input Method Modifiers. Since the arrival of Unicode, the problem of having “competing” encodings has been solved: UTF covers everything (well….), there is a wealth of information and experience from linux and other applications. If GENODE Labs is serious about being “commercial”, they should develop an IME. Just consider something as simple as a car navigation system: if you cannot support Japanese, none of the Japanese manufacturers will ever adopt your product.

Genode supports Unicode codepoints and UTF-8 strings since years.

Genode does not support alternative inputs methods at the base level. This is comparable to Linux that requires distros to add ibus (or something similar) to the Linux kernel and libc.

Genode supports Qt6, which to my knowledge comes with support for other input methods. So, please feel free to give it a shot and contribute any improvements. We’re still talking about the Genode Open-Source project, right?

Haha, you are right!

The problem is as follows: if there are other people who can develop input methods, there is no reason that I could not also develop an input method. But it will take a very, very long time for me :grinning_face_with_smiling_eyes:

I have never really thought about input methods, I have always taken them for granted, but for the last few weeks I have been looking around a bit to learn more. Apparently, there is a document from the “North East Asian Open Source Society” which defines a sort of general framework for an input method, and apparently iBus is (at least partly) based on that document. I don’t know whether the NEAOSS specification is specific for linux-like OS or more general.

But to be honest, at the moment, I do not know how to write an input method. I will try to study a bit more in the coming weeks.

It would be nice to add fonts to the standard SculptOS so that non-English speakers can at least read their language in falkon and get a better user-experience. There are TTF that support all Unicode characters. Those fonts are not very pretty but it is a one-size-fits-all solution. And otherwise take inspiration from the font collections used in Ubuntu or Gentoo Linux.