I have decided to write a short piece here for those who are interested, and I hope to attract the attention of the developers.
Once, in the old days, the “characters” used in computers were based on 8-bit encoding, which allows 128 characters. That is sufficient for the latin alphabet, numerical characters, and various others.
From the very beginning, there were problems with this 1-byte encoding: even in most European languages there are extra characters, and the default 1-byte encoding was not enough to simply support all these characters at the same time. Various “local” variations came into use.
The problem was further compounded by Chinese, Japanese and Korean, the so-called CJK languages. Chinese characters are built-up of various building blocks (in Japanese these are called “radicals”); a “character” consists of a combination of one or more radicals. Due to this “modular” nature, there are tens of thousands of characters.
Skipping some of the historical developments, about 40 years ago the situation was as follows:
- Japanese: there is a list of “standard” characters (approx. 1950 characters); “the media” are requested to limit their use to this list. In practice, the situation is more complicated: there are family names and place names which are (traditionally) written with other characters. Thus, in practice there is the Japanese Industrial Standard (JIS) list of about 6400 characters.
- Korean is written in Hangul, which are compound characters, built of one, two, or three phonetic parts. There are more than 11,000 possoble combinations. Besides this, many people in Korea also use Chinese characters to write family names, etc.
- In mainland China the communist government has mandated the use of an official list of about 2000 “simplified” characters
- But in places like Taiwan, Hong Kong and Macao the traditional Chinese characters are still in use
The situation is even more complicated in practice: suppose you want to edit a critical edition of a 12th century Buddhist text - until very recently, such work required the creation of custom fonts and charcter encodings.
It is clear that at least 16 bits are needed to encode CJK languages. In the UNIX community work started early, leading to EUC - Extended UNIX Encoding (EUC-JP, EUC-KR and various flavours for Chinese). However, the needs of the UNIX community were not the same as the needs for “daily work” - consider electronic cash registers, signage at railway stations, displays on electronic devices, cell phones, etc. In the 1980s two more encodings appeared: JIS and Shift-JIS. To make the situation even more interesting, Microsoft opted for yet another encoding, Code page 932.
The above is limited to output. For input the situation is more complicated even. Consider a news paper in 19th century Japan: printing is not so much of a problem (you can make characters in movable type), but writing is by hand. The question is how to make an input system for Japanese (or Korean, or Chinese).
The problem of input was really only solved in the 1980s with so called “input method modifiers”: since it is hardly practical to make keyboars with thousands of keys, methods evolved to type “phonetically” on “tradiational” keyboards. An “interim layer” then captures the input from the keyboard, and forwards this to another software which modifies the keyboard input into the desired form: Japanese, or Korean, or Chinese, or Hindi, or …
Apple software included support for Japanese from a very early stage, leading to a large market share of Apple computers in Japan.In Windows the support for Japanese was rather poor, leading to third-party plugins such as (famous) ATOK (ASCII to Kanji). Linux offered support for Japanese (based on EUC) from the earliest phase.
I am surprised that in 2026 GENODE does not support Input Method Modifiers. Since the arrival of Unicode, the problem of having “competing” encodings has been solved: UTF covers everything (well….), there is a wealth of information and experience from linux and other applications. If GENODE Labs is serious about being “commercial”, they should develop an IME. Just consider something as simple as a car navigation system: if you cannot support Japanese, none of the Japanese manufacturers will ever adopt your product.