Re: [vdr] vdr-1.3.27 and UTF-8

20 Jul 2005


      On Wed, 20 Jul 2005, Klaus Schmidinger (KS) wrote:
...
...
I think the confusion comes from the assumption that a character is
exactly one byte long.
strlen counts bytes not characters. 
in utf-8 a character can be up to 4 (or was it 8) bytes long.
IIRC, there are new functions to count characters (wstrlen, wstrcmp,
etc.)
Aren't you confusing this with "wide character" functions?
yes, I am talking about wide characters. I don't think I am confusing 
anything (correct me if I'm wrong)
from glibc manual:
...
Introduction to Extended Characters
A variety of solutions is available to overcome the differences between 
character sets with a 1:1 relation between bytes and characters and 
character sets with ratios of 2:1 or 4:1. [...]
As shown in some other part of this manual, a completely new family has 
been created of functions that can handle wide character texts in 
memory. The most commonly used character sets for such internal wide 
character representations are Unicode and ISO 10646 [...] Unicode was 
originally planned as a 16-bit character set; whereas, ISO 10646 was 
designed to be a 31-bit large code space. [...]
UTF-8 is an ASCII compatible encoding where ASCII characters are 
represented by ASCII bytes and non-ASCII characters by sequences of 2-6 
non-ASCII bytes [...]
To represent wide characters the char type is not suitable.
For this reason the ISO C standard introduces [...] wchar_t,
[...]
Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\
-------------------------------------------------------------------- __V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [vdr] vdr-1.3.27 and UTF-8