Hi,
It seems, that the encoding of the epg.data file is utf-8, but sometimes, there are lines like this:
C S19.2E-133-3-263 18:00 GRÖD - RBS
The "Ö" is one byte (0xD6) that seems not conform to utf-8 encoding.
How could I avoid such characters in the epg.data file please?
TIA for any hints,
On 29 Nov 2015, at 14:04, Peter Münster pmlists@free.fr wrote:
Hi,
It seems, that the encoding of the epg.data file is utf-8, but sometimes, there are lines like this:
C S19.2E-133-3-263 18:00 GRÖD - RBS
The "Ö" is one byte (0xD6) that seems not conform to utf-8 encoding.
How could I avoid such characters in the epg.data file please?
Have you tried this (from the VDR “INSTALL” file)?
Workaround for providers not encoding their DVB SI table strings correctly --------------------------------------------------------------------------
According to "ETSI EN 300 468" the default character set for SI data is ISO6937. But unfortunately some broadcasters actually use ISO-8859-9 or other encodings, but fail to correctly announce that. Users who want to set the default character set to something different can do this by using the command line option --chartab with something like ISO-8859-9.
Klaus
On Sun, Nov 29 2015, Klaus Schmidinger wrote:
Have you tried this (from the VDR “INSTALL” file)?
Workaround for providers not encoding their DVB SI table strings correctly
Thanks, I've tried it. But unfortunately I've got today this line:
C S19.2E-133-3-263 SVM - GR\326D
with the same 0xD6 character...
Would it be possible/easy to patch vdr to filter out such errors? What is the right function to look at?
TIA for any help,
On 02 Dec 2015, at 20:45, Peter Münster pmlists@free.fr wrote:
On Sun, Nov 29 2015, Klaus Schmidinger wrote:
Have you tried this (from the VDR “INSTALL” file)?
Workaround for providers not encoding their DVB SI table strings correctly
Thanks, I've tried it. But unfortunately I've got today this line:
C S19.2E-133-3-263 SVM - GR\326D
with the same 0xD6 character...
Would it be possible/easy to patch vdr to filter out such errors? What is the right function to look at?
Take a look at StripControlCharacters() or cEvent::FixEpgBugs() in epg.c.
Klaus
On Wed, Dec 02 2015, Klaus Schmidinger wrote:
C S19.2E-133-3-263 SVM - GR\326D
Would it be possible/easy to patch vdr to filter out such errors? What is the right function to look at?
Take a look at StripControlCharacters() or cEvent::FixEpgBugs() in epg.c.
It seems, that these functions only take care of the title and the description, but not the channel name.
Finally, I've patched vdr like this:
--8<---------------cut here---------------start------------->8--- --- epg.c~ 2013-12-28 12:33:08.000000000 +0100 +++ epg.c 2015-12-06 15:54:58.312233837 +0100 @@ -1064,11 +1064,32 @@ } }
+static char *StripFunny8bitCharacters(const char *src) +{ + static char dest[100]; + strn0cpy(dest, src, 100); + char *s = dest; + int len = strlen(s); + while (len > 0) { + int l = Utf8CharLen(s); + uchar *p = (uchar *)s; + if (l == 1 && *p > 0x7F) { // this is not utf-8 + memmove(s, p + 1, len); // we also copy the terminating 0! + len--; + l = 0; + } + s += l; + len -= l; + } + return dest; +} + void cSchedule::Dump(FILE *f, const char *Prefix, eDumpMode DumpMode, time_t AtTime) const { cChannel *channel = Channels.GetByChannelID(channelID, true); if (channel) { - fprintf(f, "%sC %s %s\n", Prefix, *channel->GetChannelID().ToString(), channel->Name()); + fprintf(f, "%sC %s %s\n", Prefix, *channel->GetChannelID().ToString(), + StripFunny8bitCharacters(channel->Name())); const cEvent *p; switch (DumpMode) { case dmAll: { --8<---------------cut here---------------end--------------->8---
It seems to work. Would it be possible to integrate this patch into vdr?
On 06 Dec 2015, at 20:55, Peter Münster pmlists@free.fr wrote:
On Wed, Dec 02 2015, Klaus Schmidinger wrote:
C S19.2E-133-3-263 SVM - GR\326D
Would it be possible/easy to patch vdr to filter out such errors? What is the right function to look at?
Take a look at StripControlCharacters() or cEvent::FixEpgBugs() in epg.c.
It seems, that these functions only take care of the title and the description, but not the channel name.
Sorry, I missed that.
Finally, I've patched vdr like this:
--8<---------------cut here---------------start------------->8--- --- epg.c~ 2013-12-28 12:33:08.000000000 +0100 +++ epg.c 2015-12-06 15:54:58.312233837 +0100 @@ -1064,11 +1064,32 @@ } }
+static char *StripFunny8bitCharacters(const char *src) +{
- static char dest[100];
- strn0cpy(dest, src, 100);
- char *s = dest;
- int len = strlen(s);
- while (len > 0) {
int l = Utf8CharLen(s);
uchar *p = (uchar *)s;
if (l == 1 && *p > 0x7F) { // this is not utf-8
memmove(s, p + 1, len); // we also copy the terminating 0!
len--;
l = 0;
}
s += l;
len -= l;
- }
- return dest;
+}
void cSchedule::Dump(FILE *f, const char *Prefix, eDumpMode DumpMode, time_t AtTime) const { cChannel *channel = Channels.GetByChannelID(channelID, true); if (channel) {
fprintf(f, "%sC %s %s\n", Prefix, *channel->GetChannelID().ToString(), channel->Name());
fprintf(f, "%sC %s %s\n", Prefix, *channel->GetChannelID().ToString(),
const cEvent *p; switch (DumpMode) { case dmAll: {StripFunny8bitCharacters(channel->Name()));
--8<---------------cut here---------------end--------------->8---
It seems to work. Would it be possible to integrate this patch into vdr?
Well, first we should investigate why this isn’t set correctly in libsi/si.c. That’s the place where such fixes should actually be done. I’ll look into this once I have my VDR development environment up and running at my new place…
Klaus