Hi,
I am making the muggle plugin work with UTF-8 and have a little problem:
since asprintf leads to segfaults if feeded with incorrect UTF-8 characters, I wanted to write a wrapper function which would then check the return value of asprintf. However I have a problem with the variable argument list and the va_* macros. Using gdb shows that, in the following example, in
res=asprintf (strp, fmt, ap);
ap is interpreted not as a list of arguments but as an integer.
What is wrong here?
BTW I am quite sure that vdr will sometimes coredump since it never checks the return value of asprintf. One suspect would be if somebody used a latin1 charset and had special characters like äöü in file names and then changes to utf-8 without converting file names to utf-8. If vdr then passes such a file name to asprintf, corrupted memory results. Might be difficult to debug remotely.
#include <stdarg.h> #include <stdio.h> #include <string.h>
int msprintf(char **strp, const char *fmt, ...) { va_list ap; int res; va_start (ap, fmt); res=asprintf (strp, fmt, ap); va_end (ap); }
int main() { char *buffer;
asprintf(&buffer,"test: %d\n",5); write(1,buffer,strlen(buffer)); free(buffer);
msprintf(&buffer,"test: %d\n",5); write(1,buffer,strlen(buffer)); free(buffer); }
On 02/10/08 16:06, Wolfgang Rohdewald wrote:
You could use VDR's cString::sprintf() instead. This is probably also what I am going to do in the VDR core code, to avoid asprintf() altogether. The single leftover vasprintf() call in cString::sprintf() can then be made safe.
Klaus
On Sonntag, 10. Februar 2008, Klaus Schmidinger wrote:
vasprintf was a good hint - I only had to change asprintf to vasprintf, same arguments. now it works as expected.
I will use my msprintf until you have made cString::sprintf() safe.
Thank you!
int msprintf(char **strp, const char *fmt, ...) { va_list ap; va_start (ap, fmt); int res=vasprintf (strp, fmt, ap); va_end (ap); }
Wolfgang Rohdewald wrote:
I never understood what the problem is with utf8 and asprintf, since utf8 is mostly ASCIIZ backwards compatible, and asprintf probably doesn't even know the difference between utf8 and ascii. What special handling does asprintf with utf8? Is there some example that causes the trouble?
Worst case I can imagine would be that there's an invalid 0 byte inside an utf8 multibyte char, and even this would just result in an utf8 string that terminates with an incomplete char - and shouldn't handling such crap be the job of whatever processes the utf8 string later on? At least IMHO it would be wise to count any 0 byte as string end.
Cheers,
Udo
On Sonntag, 10. Februar 2008, Udo Richter wrote:
Worst case I can imagine would be that there's an invalid 0 byte inside an utf8 multibyte char
printf and family sometimes have to count characters, so I suppose they have to scan UTF
I know from mysql and postgresql that they also scan every UTF string passed from the client for illegal chars and abort the transaction if they find any.
My problem code:
mgDb::Build_cddbid(const mgSQLString& artist) const { char *s; asprintf(&s,"%ld-%.9s",random(),artist.original());
segfaults only if illegal utf8 chars appear in artist.original()
asprintf returns -1, so s is nothing that could be freed, and this gives a nice backtrace:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1319449712 (LWP 22989)] 0xb7bf57ea in free () from /lib/tls/i686/cmov/libc.so.6 (gdb) bt #0 0xb7bf57ea in free () from /lib/tls/i686/cmov/libc.so.6 #1 0xb7986908 in mgDb::Build_cddbid (this=0x86ed8e8, artist=@0xb15aa698) at mg_db.c:1023
If I change %.9s to %s, everything is fine.
I cannot easily simplify that, if I try like this, it works:
char artist[50]; strcpy(artist,"Celine Dion"); artist[1]=0xe9; asprintf(&buffer,"%ld-%.9s",random(),artist); printf(buffer); free(buffer);
I demand that Wolfgang Rohdewald may or may not have written...
printf and family sometimes have to count characters, so I suppose they have to scan UTF
No; they only ever count bytes. The encoding is irrelevant.
[snip]
Wolfgang Rohdewald wrote:
As you can see it doesn't segfault on asprintf but on free().
if(asprintf(...) >= 0) { printf(...); free(...); }
Or just use normal snprintf as the amount of charactes to print is fixed anyways so you don't need a variable sized buffer.
cu Ludwig
On Montag, 11. Februar 2008, Ludwig Nussel wrote:
As you can see it doesn't segfault on asprintf but on free().
I did see that. I did not say it segfaults but it does lead to segfaults.
I do not want to change dozens of places like that. Just have one single point which can emit an error message so I can then see what has to be done for each individual place. Most of the asprintf calls will never get into trouble anyway. But if a user reports a problem I prefer an error message over some vague description.
Or just use normal snprintf as the amount of charactes to print is fixed anyways so you don't need a variable sized buffer.
this is just a minimal sample. The real code has variable length strings.
On Montag, 11. Februar 2008, Ludwig Nussel wrote:
of course. See above.
Udo Richter wrote:
The manpage explicitly says that the content of s is undefined in case of error. So even if it works you can't really count on it. You can't get around checking the return value.
cu Ludwig
On Montag, 11. Februar 2008, Udo Richter wrote:
Well, that leads to the question whether s is unchanged in case of a -1 error return, and whether this would work:
I can confirm that. The man page however says the value will be undefined.
My current understanding is:
1. dont forget to call setlocale! Normally setlocale(LC_ALL,"")
2. if locale is UTF-8, asprintf returns -1 if the string contains illegal UTF-8 characters anywhere
3. this and out of memory are the only reasons I know for result -1. The man page to asprintf says there could be other errors than out of memory but mentions none.
4. If result -1, the buffer pointer stays unchanged, see man page
5. if locale is UTF-8 and a maximum length is defined as in %.9s, and if %.9s would cut a multibyte char, only 8 chars will be used. See example from Ludwig Nussel.
What I don't know where in the man pages this is explained - I did not find anything about it. Neither man asprintf or man printf
Udo Richter wrote:
asprintf needs to check for multibyte characters to not cut them in the middle and produce invalid output.
cu Ludwig
I demand that Ludwig Nussel may or may not have written...
[snip]
asprintf needs to check for multibyte characters to not cut them in the middle and produce invalid output.
No - it's encoding-neutral. What you want is your own version which does that (but if you still think that that should be called asprintf, you may as well rewrite printf etc. while you're at it), or conversion to/from wide character strings (and a version of asprintf() which handles wchar_t*).
Darren Salt wrote:
Try the following with 'LANG=C' and 'LANG=de_DE.UTF-8'. You will notice that in the latter case it will not cut the umlaut.
#define _GNU_SOURCE #include <stdio.h> #include <string.h> #include <stdlib.h> #include <locale.h>
int main(void) { char* buffer; char artist[] = "Haegar"; int ret; setlocale(LC_ALL, ""); artist[1]=0xc3; artist[2]=0xa4; ret = asprintf(&buffer,"%.2s\n",artist); printf("%d bytes\n", ret); printf(buffer); free(buffer); return 0; }
cu Ludwig
I demand that Ludwig Nussel may or may not have written...
Try the following with 'LANG=C' and 'LANG=de_DE.UTF-8'. You will notice that in the latter case it will not cut the umlaut.
[snip code - hmm, dodgy use of printf]
Interesting. It omits it entirely. But the rest of my point still stands - it still counts bytes.
Wolfgang Rohdewald wrote:
since asprintf leads to segfaults if feeded with incorrect UTF-8 characters,
It's not asprintf that segfaults but the call to free uninitialized memory afterwards.
use vasprintf
Even if you use vasprintf to make the function actually work you still need to check the return value of vasprintf otherwise this wrapper would be kind of useless.
cu Ludwig