[linux-dvb] Unicode Teletext (was: Re: getting started with msi tv card)

BOUWSMA Barry freebeer.bouwsma at gmail.com
Wed Jan 28 03:49:03 CET 2009


On Wed, 28 Jan 2009, Daniel Dalton wrote:

> > Maybe a full Unicode X font will include such characters
> > and I can simply map them to UTF8, but I'm primarily
> > interested in the text content information on my text console.
> > 
> > Here's the pr0n...
> > 
> >                       ???X???X*XX*???*???????           XXX*    AMI
> >                       ???X??????*??? ???X?* ???          **XXXX
> > 
> > No, this is not going to work.  There are too many characters
> > which are not yet converted to something and I'm having to add

> ah, ok... I kinda get it... :-)

Actually, your `mutt' mailer has managed to convert the
UTF-8 encoding which I hope you received into ASCII and
substituted its own `?' for those block characters which
should have appeared as correct UTF-8, though I'll need to
check an archive.

And after quite a few too many hours, I still don't get it,
and I'm going to have to ask help from the collective
knowledge pooled here.

I've seen that the 10646 encoded fonts available usually
have the familiar box-drawing and related characters I've
partly been able to use for a few of the graphics.

Unfortunately, these seem to be either based on a 2x2 set
of quads, or a 3x4 array.  While the teletext graphics in
use uses a 2x3 array.

I've come upon two sets of fonts which supposedly cover
the teletext character set with a 10646 encoding.  But
the first one, which does include the 2x3 graphics chars
that otherwise need a `fontspecific' encoding, seems to
have hijacked existing assigned unicode characters in
order to display the graphics.

That is, with this font, these characters no longer display
properly (selection limited due to pasting from a 512-char
console font)
[◆]  U+25C6   ◆  BLACK DIAMOND
[◊]  U+25CA   ◊  LOZENGE
This is matched by reading the code:
const wchar_t graphutf8[128] = { // Graphic characters on an unicode terminal ISO-10646
[...]
        0x25A0,0x25A1,0x25A2,0x25A3,0x25A4,0x25A5,0x25A6,0x25A7, 
[...]
        0x25B0,0x25B1,0x25B2,0x25B3,0x25B4,0x25B5,0x25B6,0x25B7, 
[...]
        0x25C0,0x25C1,0x25C2,0x25c3,0x25C4,0x25C5,0x25C6,0x25C7, 
[...]
0x25D8,0x25D9,0x25DA,0x25DB,0x25DC,0x25DD,0x25DE,0x25DF,
};

I'm still trying to determine whether the second font has any
graphics and where they would be hidden -- even the handy
[█]  U+2588   █  FULL BLOCK
character is missing.


Does anyone know whether the various 2x3 graphics used in
teletext fonts are in fact present in Unicode?  I haven't
been able to convince google to give me the answer I want.
I would think that with everything I do see with a unifont
font, that such widely-used characters wouldn't have been
left out...


thanks for any pointers,
barry bouwsma



More information about the linux-dvb mailing list