On Mittwoch 20 Juli 2005 00:00, Harald Milz wrote:
So at the end of the day, there's no way around a proper charset conversion... is there anything I can do to help?
Alexander Riedel announced UTF-8 Patch 0.0.3 on May 12 in this list.
Wolfgang Rohdewald wolfgang@rohdewald.de wrote:
On Mittwoch 20 Juli 2005 00:00, Harald Milz wrote:
So at the end of the day, there's no way around a proper charset conversion... is there anything I can do to help?
Alexander Riedel announced UTF-8 Patch 0.0.3 on May 12 in this list.
Mmmm, yes, this patch is against 1.3.24, and applying it to 1.3.27 says,
seneca:/usr/local/src/vdr/VDR # patch -p0 --dry-run < ../vdr-utf8-0.0.3.patch-0001.bin patch unexpectedly ends in middle of line patch: **** Only garbage was found in the patch input.
It seems nobody cares about this functionality when it was against 1.3.24 and we're now working on 1.3.28. Will 1.3.28 be UTF-8 capable? Klaus, please. I understand you get lots of patches solving only minor and local problems but this is about international use of vdr and IMHO important enough.
PS: It would be easier to avoid questions like mine (if you wanted to tell me I should look up the archives) if the list archive were searchable. ;-)
Harald Milz wrote:
Wolfgang Rohdewald wolfgang@rohdewald.de wrote:
On Mittwoch 20 Juli 2005 00:00, Harald Milz wrote:
So at the end of the day, there's no way around a proper charset conversion... is there anything I can do to help?
Alexander Riedel announced UTF-8 Patch 0.0.3 on May 12 in this list.
Mmmm, yes, this patch is against 1.3.24, and applying it to 1.3.27 says,
seneca:/usr/local/src/vdr/VDR # patch -p0 --dry-run < ../vdr-utf8-0.0.3.patch-0001.bin patch unexpectedly ends in middle of line patch: **** Only garbage was found in the patch input.
It seems nobody cares about this functionality when it was against 1.3.24 and we're now working on 1.3.28. Will 1.3.28 be UTF-8 capable? Klaus, please. I understand you get lots of patches solving only minor and local problems but this is about international use of vdr and IMHO important enough.
This would be very important, indeed, but maybe Klaus sees this different. A patch would be also OK for a while, unfortunately I couldn't patch VDR with UTF-8 support until now (not even that 1.3.24 version, maybe due to other patches). I'd like to contribute to vdradmin in this respect, too (I actually have a littel patch working here, but only for displaying romanian EPG data correctly in vdradmin). I think I'd have to rework this when VDR finally will support UTF-8.
PS: It would be easier to avoid questions like mine (if you wanted to tell me I should look up the archives) if the list archive were searchable. ;-)
You could use snews://snews.gmane.org in your newsreader (I'm using Thunderbird on Windows and Linux) to search, read and post to the VDR ML, it's replicated there really in sync - matter of minutes - and then you might want to turn off email delivery from the ML itself as you could use the newsreader for everything)...
Bye Lucian
Harald Milz wrote:
Wolfgang Rohdewald wolfgang@rohdewald.de wrote:
On Mittwoch 20 Juli 2005 00:00, Harald Milz wrote:
So at the end of the day, there's no way around a proper charset conversion... is there anything I can do to help?
Alexander Riedel announced UTF-8 Patch 0.0.3 on May 12 in this list.
Mmmm, yes, this patch is against 1.3.24, and applying it to 1.3.27 says,
seneca:/usr/local/src/vdr/VDR # patch -p0 --dry-run < ../vdr-utf8-0.0.3.patch-0001.bin patch unexpectedly ends in middle of line patch: **** Only garbage was found in the patch input.
I've rediffed the patch against 1.3.25 some time ago. Apparently it still applies to 1.3.27. I didn't check whether it actually works though. You can find it here: http://www.suse.de/~lnussel/vdr/ (May take some time until the web server gets synced).
It's also in the vdr13 rpm packages on the ftp server. You have to rebuild the src.rpm using --with freetype to enable it though.
It seems nobody cares about this functionality when it was against 1.3.24 and we're now working on 1.3.28. Will 1.3.28 be UTF-8 capable? Klaus, please. I understand you get lots of patches solving only minor and local problems but this is about international use of vdr and IMHO important enough.
The changes required for true UTF-8 support are not trivial. The patch from Alexander is just a start. Something for post-1.4 IMHO.
cu Ludwig
Harald Milz wrote:
Wolfgang Rohdewald wolfgang@rohdewald.de wrote:
On Mittwoch 20 Juli 2005 00:00, Harald Milz wrote:
So at the end of the day, there's no way around a proper charset conversion... is there anything I can do to help?
Alexander Riedel announced UTF-8 Patch 0.0.3 on May 12 in this list.
Mmmm, yes, this patch is against 1.3.24, and applying it to 1.3.27 says,
seneca:/usr/local/src/vdr/VDR # patch -p0 --dry-run < ../vdr-utf8-0.0.3.patch-0001.bin patch unexpectedly ends in middle of line patch: **** Only garbage was found in the patch input.
It seems nobody cares about this functionality when it was against 1.3.24 and we're now working on 1.3.28. Will 1.3.28 be UTF-8 capable? Klaus, please. I understand you get lots of patches solving only minor and local problems but this is about international use of vdr and IMHO important enough.
Well, UTF-8 is something I, personally, absolutely don't need - and don't want! So any implementation of UTF-8 in VDR will, first and foremost, have to be _completely_ enclosed in #ifdefs, so that a "clean" version of VDR can be compiled ;-)
That said, there will be no UTF-8 support in the official version 1.4 of VDR. Maybe later, but it has no big priority for me...
Klaus
Klaus Schmidinger wrote:
Harald Milz wrote:
It seems nobody cares about this functionality when it was against 1.3.24 and we're now working on 1.3.28. Will 1.3.28 be UTF-8 capable? Klaus, please. I understand you get lots of patches solving only minor and local problems but this is about international use of vdr and IMHO important enough.
Well, UTF-8 is something I, personally, absolutely don't need - and don't want! So any implementation of UTF-8 in VDR will, first and foremost, have to be _completely_ enclosed in #ifdefs, so that a "clean" version of VDR can be compiled ;-)
That said, there will be no UTF-8 support in the official version 1.4 of VDR. Maybe later, but it has no big priority for me...
What's the problem with utf-8? Why do you consider it "unclean"?
just curious, Johannes
Johannes Stezenbach wrote:
Klaus Schmidinger wrote:
Harald Milz wrote:
It seems nobody cares about this functionality when it was against 1.3.24 and we're now working on 1.3.28. Will 1.3.28 be UTF-8 capable? Klaus, please. I understand you get lots of patches solving only minor and local problems but this is about international use of vdr and IMHO important enough.
Well, UTF-8 is something I, personally, absolutely don't need - and don't want! So any implementation of UTF-8 in VDR will, first and foremost, have to be _completely_ enclosed in #ifdefs, so that a "clean" version of VDR can be compiled ;-)
That said, there will be no UTF-8 support in the official version 1.4 of VDR. Maybe later, but it has no big priority for me...
What's the problem with utf-8? Why do you consider it "unclean"?
just curious, Johannes
To me, a character is an entity that's always the same size (preferably one byte). UTF-8 breaks with this, so if you have a string that has, e.g. a strlen() of 10, you can't be sure that this will be really 10 printing characters because there might be some "escaped" characters.
Well, I admit that I'm in a lucky position where iso8859-1 is totally sufficient for me. That's why I have so little interest in doing anything in that direction...
Klaus
Klaus Schmidinger wrote:
[...] To me, a character is an entity that's always the same size (preferably one byte). UTF-8 breaks with this, so if you have a string that has, e.g. a strlen() of 10, you can't be sure that this will be really 10 printing characters because there might be some "escaped" characters.
Well, IIRC "dvb strings" can have embedded control characters so if you didn't just filter them out you'd have to deal with strlen != number of characters to render anyways.
cu Ludwig
On Wed, 20 Jul 2005, Ludwig Nussel (LN) wrote:
Klaus Schmidinger wrote:
[...] To me, a character is an entity that's always the same size (preferably one byte). UTF-8 breaks with this, so if you have a string that has, e.g. a strlen() of 10, you can't be sure that this will be really 10 printing characters because there might be some "escaped" characters.
I think the confusion comes from the assumption that a character is exactly one byte long.
strlen counts bytes not characters.
in utf-8 a character can be up to 4 (or was it 8) bytes long.
IIRC, there are new functions to count characters (wstrlen, wstrcmp, etc.)
c ya Sergei
Sergei Haller wrote:
On Wed, 20 Jul 2005, Ludwig Nussel (LN) wrote:
Klaus Schmidinger wrote:
[...] To me, a character is an entity that's always the same size (preferably one byte). UTF-8 breaks with this, so if you have a string that has, e.g. a strlen() of 10, you can't be sure that this will be really 10 printing characters because there might be some "escaped" characters.
I think the confusion comes from the assumption that a character is exactly one byte long.
strlen counts bytes not characters.
in utf-8 a character can be up to 4 (or was it 8) bytes long.
IIRC, there are new functions to count characters (wstrlen, wstrcmp, etc.)
Aren't you confusing this with "wide character" functions?
Klaus
Ludwig Nussel wrote:
Klaus Schmidinger wrote:
[...] To me, a character is an entity that's always the same size (preferably one byte). UTF-8 breaks with this, so if you have a string that has, e.g. a strlen() of 10, you can't be sure that this will be really 10 printing characters because there might be some "escaped" characters.
Well, IIRC "dvb strings" can have embedded control characters so if you didn't just filter them out you'd have to deal with strlen != number of characters to render anyways.
cu Ludwig
Those control characters are filtered in libsi.
Klaus
On Wed, 20 Jul 2005, Klaus Schmidinger (KS) wrote:
I think the confusion comes from the assumption that a character is exactly one byte long.
strlen counts bytes not characters. in utf-8 a character can be up to 4 (or was it 8) bytes long.
IIRC, there are new functions to count characters (wstrlen, wstrcmp, etc.)
Aren't you confusing this with "wide character" functions?
yes, I am talking about wide characters. I don't think I am confusing anything (correct me if I'm wrong)
from glibc manual:
Introduction to Extended Characters
A variety of solutions is available to overcome the differences between character sets with a 1:1 relation between bytes and characters and character sets with ratios of 2:1 or 4:1. [...]
As shown in some other part of this manual, a completely new family has been created of functions that can handle wide character texts in memory. The most commonly used character sets for such internal wide character representations are Unicode and ISO 10646 [...] Unicode was originally planned as a 16-bit character set; whereas, ISO 10646 was designed to be a 31-bit large code space. [...]
UTF-8 is an ASCII compatible encoding where ASCII characters are represented by ASCII bytes and non-ASCII characters by sequences of 2-6 non-ASCII bytes [...]
To represent wide characters the char type is not suitable. For this reason the ISO C standard introduces [...] wchar_t, [...]
Sergei
Ludwig Nussel ludwig.nussel@suse.de wrote:
I've rediffed the patch against 1.3.25 some time ago. Apparently it still applies to 1.3.27. I didn't check whether it actually works though. You can find it here: http://www.suse.de/~lnussel/vdr/
Thanks - this stumbles over
+#include FT_FREETYPE_H
in line 728 of the patch when I do make clean already... where should this be defined? It's not in freetype2-devel-2.1.9-3.i586.rpm as it seems.
It's also in the vdr13 rpm packages on the ftp server. You have to rebuild the src.rpm using --with freetype to enable it though.
Let me check that.
Klaus Schmidinger Klaus.Schmidinger@cadsoft.de wrote:
So any implementation of UTF-8 in VDR will, first and foremost, have to be _completely_ enclosed in #ifdefs, so that a "clean" version of VDR can be compiled ;-)
Sounds like a bounty especially for our eastern European collegues could help (heck, I don't mean the chocolate bar).
On Wednesday 20 July 2005 17:52, Sergei Haller wrote:
On Wed, 20 Jul 2005, Ludwig Nussel (LN) wrote:
Klaus Schmidinger wrote:
[...] To me, a character is an entity that's always the same size (preferably one byte). UTF-8 breaks with this, so if you have a string that has, e.g. a strlen() of 10, you can't be sure that this will be really 10 printing characters because there might be some "escaped" characters.
I think the confusion comes from the assumption that a character is exactly one byte long.
strlen counts bytes not characters.
in utf-8 a character can be up to 4 (or was it 8) bytes long.
Correct. The "ascii 7 bit" is one byte, everything else needs escape characters, e.g. German umlauts are 2 bytes each.
IIRC, there are new functions to count characters (wstrlen, wstrcmp, etc.)
Wrong. This is for wide characters, where every character uses 2 or 4 bytes.
In fact IF you want to support unicode in an application, you are better off making your application use wide characters inside (wchar_t), and make all external interfaces use UTF-8 (e.g. file input/output).
Using UTF-8 inside an application gets tricky, as you cannot use strlen to count the characters, for example.
Kind regards, Stefan
Harald Milz wrote:
Ludwig Nussel ludwig.nussel@suse.de wrote:
I've rediffed the patch against 1.3.25 some time ago. Apparently it still applies to 1.3.27. I didn't check whether it actually works though. You can find it here: http://www.suse.de/~lnussel/vdr/
Thanks - this stumbles over
+#include FT_FREETYPE_H
in line 728 of the patch when I do make clean already... where should this be defined? It's not in freetype2-devel-2.1.9-3.i586.rpm as it seems.
You are probably missing -I/usr/include/freetype2
cu Ludwig