Author Topic: Unicode  (Read 3701 times)

Offline alexBishop

  • Jr. Member
  • *
  • Posts: 10
« on: January 23, 2011, 08:59:28 AM »
I am creating my own OS and wish to have it support Unicode, can anyone help me?

Offline QUASAR

  • Jr. Member
  • *
  • Posts: 4
Re: Unicode
« Reply #1 on: January 23, 2011, 02:17:03 PM »
Sure we can, but what is your problem?

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2490
  • Country: us
Re: Unicode
« Reply #2 on: January 23, 2011, 05:50:31 PM »
Supporting Unicode! :)

I tried to "merge" this topic with an earlier question... didn't seem to work, so try it this way: Alex asked:

Does anyone know how I can make my operating system capable of supporting Unicode. I have already written some code for printing fonts characters to the screen, but I only have access to the encoding for ASCII characters as they are already on the computer at start up, does anyone know where I can find the full Unicode encoding.

And Rob replied, in essence:

As I see it, this breaks down into two issues, the "encoding" (determine which "glyph" we want to draw), and a "font file" (where do the pixels go).

I think the "encoding" is done in two common ways, UTF-8 and UTF-16, with UTF-8 being the more common (not sure of this - Xlib has a lot of routines with "16" in the name - Windows has APIs ending in "A" or "W"...). I think it's UTF-8 where we grab the first byte, and examine the high bit. If it's clear, it is a normal ascii character (Unicode and ascii overlap here). If the high bit is set, we count the number of remaining set bits to know how many further bytes are involved. I don't know if the remaining bytes are arranged big-endian or little-endian, but in either case, we combine 'em into a single number. It is's job to standardize which glyph is represented by this number. (Am I confused yet?)

Alex mentions that there are fonts in ROM for ascii characters. I've used the 8 x 8 font in ROM - I forget the address (can be found by a bios interrupt). Unless I'm mistaken, this is known as a "raster font". A more modern method, I think, is a "vector font". Rather than the row and column, this gives the "angle" and "distance" (from center?) of the pixels (kinda like "polar coordinates"?) - so it can be scaled to any size character we want. (Now am I confused?)

Going to, clicking on "links" and then "last resort font" offers to let me download a font... if I agree to a great saggy pantload of legalese. In particular, "thou shalt not reverse engineer", which is the first thing I'd want to do... so I didn't. Perhaps there's enough documentation so I wouldn't have to...

Codeferever was asking about displaying Chinese characters ("in dos", but I think he has his own OS in mind, too). Essentially the same question, I think, so there's some interest in this. Not really an "assembly language question", much less a "Nasm question" - you do it the same as in C or BASIC, only in assembly language... I ASSume.

I think if I were to attempt to implement this, I'd need to figure out a few things:

Details of the encoding?

Having obtained a "glyph number", how do I know which "font file" I need to use?

Where do I get such a "font file" - preferably "free" or with the most flexible license?

What is the format of this font file?

Knowing all this, we can probably figure out the "putpixel" calls to actually draw the thing (and how to do "putpixel", if that's a question).

None of that is very "useful"... but may help to refine the question...