Re: [vdr] vdr-1.3.27 and UTF-8

21 Jul 2005


      On Wednesday 20 July 2005 17:52, Sergei Haller wrote:
...
On Wed, 20 Jul 2005, Ludwig Nussel (LN) wrote:
...
Klaus Schmidinger wrote:
...
[...]
To me, a character is an entity that's always the same size (preferably
one byte). UTF-8 breaks with this, so if you have a string that has,
e.g. a strlen() of 10, you can't be sure that this will be really 10
printing
characters because there might be some "escaped" characters.
I think the confusion comes from the assumption that a character is
exactly one byte long.
strlen counts bytes not characters.
in utf-8 a character can be up to 4 (or was it 8) bytes long.
Correct. The "ascii 7 bit" is one byte, everything else needs escape
characters, e.g. German umlauts are 2 bytes each.
...
IIRC, there are new functions to count characters (wstrlen, wstrcmp,
etc.)
Wrong. This is for wide characters, where every character uses
2 or 4 bytes.
In fact IF you want to support unicode in an application, you are
better off making your application use wide characters inside
(wchar_t), and make all external interfaces use UTF-8 (e.g. file
input/output).
Using UTF-8 inside an application gets tricky, as you cannot
use strlen to count the characters, for example.
Kind regards,
Stefan

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [vdr] vdr-1.3.27 and UTF-8