LC_NUMERIC in wprintf(3)

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

LC_NUMERIC in wprintf(3)

Jan Stary
The wprintf(3) manpage says

        The decimal point character is defined
        in the program's locale (category LC_NUMERIC)

but LC_NUMERIC is ignored in OpenBSD's C library,
as explained in setlocale(3).

Would it be an improvement to remove that sentence?
(Removing a needless newline while here.)

        Jan


Index: wprintf.3
===================================================================
RCS file: /cvs/src/lib/libc/stdio/wprintf.3,v
retrieving revision 1.5
diff -u -p -r1.5 wprintf.3
--- wprintf.3 1 Dec 2017 10:56:07 -0000 1.5
+++ wprintf.3 10 Jan 2019 13:29:41 -0000
@@ -578,14 +578,9 @@ is
 .Ql %% .
 .El
 .Pp
-The decimal point
-character is defined in the program's locale (category
-.Dv LC_NUMERIC ) .
-.Pp
 In no case does a non-existent or small field width cause truncation of
 a numeric field; if the result of a conversion is wider than the field
-width, the
-field is expanded to contain the conversion result.
+width, the field is expanded to contain the conversion result.
 .Sh SEE ALSO
 .Xr btowc 3 ,
 .Xr fputws 3 ,

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Jan Stary
On Jan 10 14:43:39, [hidden email] wrote:

> The wprintf(3) manpage says
>
> The decimal point character is defined
> in the program's locale (category LC_NUMERIC)
>
> but LC_NUMERIC is ignored in OpenBSD's C library,
> as explained in setlocale(3).
>
> Would it be an improvement to remove that sentence?
> (Removing a needless newline while here.)
>
> Jan
>
>
> Index: wprintf.3
> ===================================================================
> RCS file: /cvs/src/lib/libc/stdio/wprintf.3,v
> retrieving revision 1.5
> diff -u -p -r1.5 wprintf.3
> --- wprintf.3 1 Dec 2017 10:56:07 -0000 1.5
> +++ wprintf.3 10 Jan 2019 13:29:41 -0000
> @@ -578,14 +578,9 @@ is
>  .Ql %% .
>  .El
>  .Pp
> -The decimal point
> -character is defined in the program's locale (category
> -.Dv LC_NUMERIC ) .
> -.Pp
>  In no case does a non-existent or small field width cause truncation of
>  a numeric field; if the result of a conversion is wider than the field
> -width, the
> -field is expanded to contain the conversion result.
> +width, the field is expanded to contain the conversion result.
>  .Sh SEE ALSO
>  .Xr btowc 3 ,
>  .Xr fputws 3 ,

same for wscanf(3):


Index: wscanf.3
===================================================================
RCS file: /cvs/src/lib/libc/stdio/wscanf.3,v
retrieving revision 1.2
diff -u -p -r1.2 wscanf.3
--- wscanf.3 2 Nov 2011 22:29:07 -0000 1.2
+++ wscanf.3 10 Jan 2019 14:14:08 -0000
@@ -389,10 +389,6 @@ a conversion, although it can be suppres
 flag.
 .El
 .Pp
-The decimal point
-character is defined in the program's locale (category
-.Dv LC_NUMERIC ) .
-.Pp
 For backwards compatibility, a
 .Dq conversion
 of

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Ingo Schwarze
In reply to this post by Jan Stary
Hi Jan,

Jan Stary wrote on Thu, Jan 10, 2019 at 02:43:39PM +0100:

> The wprintf(3) manpage says
>
> The decimal point character is defined
> in the program's locale (category LC_NUMERIC)
>
> but LC_NUMERIC is ignored in OpenBSD's C library,
> as explained in setlocale(3).

Right, i agree that is bogus.

> Would it be an improvement to remove that sentence?

I wouldn't remove it outright.

That stuff is a very serious trap not only for the unwary, but also
for experienced programmers.  Less than a month ago, i discussed
the matter with a senior programmer from the GNU project who has
been maintaining important software over there for a long time and
guess what he said?  Something like "yes, you are right, that stuff
is dangerous.  I had a nasty bug in one of my programs just last
week because a printf mangled numbers in a locale-dependent way i
didn't anticipate."

I think this is exactly what CAVEATS is good for.

OK?
  Ingo


Index: wprintf.3
===================================================================
RCS file: /cvs/src/lib/libc/stdio/wprintf.3,v
retrieving revision 1.5
diff -u -p -r1.5 wprintf.3
--- wprintf.3 1 Dec 2017 10:56:07 -0000 1.5
+++ wprintf.3 10 Jan 2019 15:29:20 -0000
@@ -225,17 +225,17 @@ A
 .Cm +
 overrides a space if both are used.
 .It Sq Cm '
-Decimal conversions
-.Cm ( d , u ,
+On
+.Ox ,
+this flag has no effect.
+On other systems, it may cause the insertion of
+.Xr locale 1 Ns -dependent
+thousands separator characters into the integral parts of arguments
+of the
+.Cm d , i , u , f ,
 or
-.Cm i )
-or the integral portion of a floating point conversion
-.Cm ( f
-or
-.Cm F )
-should be grouped and separated by thousands using
-the non-monetary separator returned by
-.Xr localeconv 3 .
+.Cm F
+conversions.
 .El
 .It
 An optional decimal digit string specifying a minimum field width.
@@ -578,10 +578,6 @@ is
 .Ql %% .
 .El
 .Pp
-The decimal point
-character is defined in the program's locale (category
-.Dv LC_NUMERIC ) .
-.Pp
 In no case does a non-existent or small field width cause truncation of
 a numeric field; if the result of a conversion is wider than the field
 width, the
@@ -605,3 +601,21 @@ and
 functions
 conform to
 .St -isoC-99 .
+.Sh CAVEATS
+On systems other than
+.Ox ,
+the
+.Dv LC_NUMERIC
+.Xr locale 1
+category can cause erratic output, for example different characters
+substituted for the decimal point or thousands separator characters
+inserted into numbers.
+For security reasons, make sure that portable programs calling these
+functions never call
+.Xr setlocale 3
+with the
+.Dv LC_ALL
+or
+.Dv LC_NUMERIC
+categories but only use
+.Dv LC_CTYPE .

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Ted Unangst-6
Ingo Schwarze wrote:

> That stuff is a very serious trap not only for the unwary, but also
> for experienced programmers.  Less than a month ago, i discussed
> the matter with a senior programmer from the GNU project who has
> been maintaining important software over there for a long time and
> guess what he said?  Something like "yes, you are right, that stuff
> is dangerous.  I had a nasty bug in one of my programs just last
> week because a printf mangled numbers in a locale-dependent way i
> didn't anticipate."
>
> I think this is exactly what CAVEATS is good for.

ok. mangled numbers from code generators is one of my favorites. :)

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Ingo Schwarze
Hi Ted & Scott,

Ted Unangst wrote on Fri, Jan 11, 2019 at 04:03:32AM -0500:
> Ingo Schwarze wrote:

>> That stuff is a very serious trap not only for the unwary, but also
>> for experienced programmers.  Less than a month ago, i discussed
>> the matter with a senior programmer from the GNU project who has
>> been maintaining important software over there for a long time and
>> guess what he said?  Something like "yes, you are right, that stuff
>> is dangerous.  I had a nasty bug in one of my programs just last
>> week because a printf mangled numbers in a locale-dependent way i
>> didn't anticipate."
>>
>> I think this is exactly what CAVEATS is good for.

> ok. mangled numbers from code generators is one of my favorites. :)

Committed.

Thanks for checking!
  Ingo

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Ingo Schwarze
In reply to this post by Jan Stary
Hi,

Jan Stary wrote on Thu, Jan 10, 2019 at 03:15:38PM +0100:

> same for wscanf(3):

Fixed, but in a way similar to wprintf(3).

Yours,
  Ingo


 $ man wscanf | col -b | grep -A 6 ^CAVEATS
CAVEATS
     On systems other than OpenBSD, the LC_NUMERIC locale(1) category can
     cause parsing failures, for example because it may cause a different
     character to be expected instead of the decimal point.  For security
     reasons, make sure that portable programs calling these functions never
     call setlocale(3) with the LC_ALL or LC_NUMERIC categories but only use
     LC_CTYPE.

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Todd C. Miller-3
In reply to this post by Ingo Schwarze
Don't we have the same issue with printf(3) and scanf(3)?
For example, in src/lib/libc/stdio/vfprintf.c:1022

        decimal_point = nl_langinfo(RADIXCHAR);

Also, anything using strtod(3), including scanf(3), is likewise
affected by gdtoa's use of localeconv()->decimal_point.

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Ingo Schwarze
Hi Todd,

Todd C. Miller wrote on Fri, Jan 11, 2019 at 12:31:02PM -0700:

> Don't we have the same issue with printf(3) and scanf(3)?
> For example, in src/lib/libc/stdio/vfprintf.c:1022
>
> decimal_point = nl_langinfo(RADIXCHAR);
>
> Also, anything using strtod(3), including scanf(3), is likewise
> affected by gdtoa's use of localeconv()->decimal_point.

Sure.

I did not attempt to hunt down all offenders.
Would you regard that as a whorthwhile endeavour?
Note that would not only require considering bogus leftovers
in our code, but also considering what other systems might do.

I just thought it made sense to remove misleading statements about
our own code, but not not without warning about related portability
issues while there.

Yours,
  Ingo

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Todd C. Miller-3
On Sat, 12 Jan 2019 16:45:46 +0100, Ingo Schwarze wrote:

> I did not attempt to hunt down all offenders.
> Would you regard that as a whorthwhile endeavour?
> Note that would not only require considering bogus leftovers
> in our code, but also considering what other systems might do.
>
> I just thought it made sense to remove misleading statements about
> our own code, but not not without warning about related portability
> issues while there.

I just thought that if we were going to mention this problem at
all, then we should do so in strtod() as well.  And since printf(3)
and scanf(3) suffer from the same issue as wprintf(3) and wscanf(3)
I see no reason to only include the warning in the wide versions.

I'm not advocating that we try to hunt down all possible affected
functions.  Actually, I wish there was a single central place we
could document this where people would see it, but I'm not sure
there really is one.

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Ingo Schwarze
Hi Todd,

Todd C. Miller wrote on Sat, Jan 12, 2019 at 08:04:31PM -0700:
> On Sat, 12 Jan 2019 16:45:46 +0100, Ingo Schwarze wrote:
 
>> I did not attempt to hunt down all offenders.
>> Would you regard that as a whorthwhile endeavour?
>> Note that would not only require considering bogus leftovers
>> in our code, but also considering what other systems might do.
>>
>> I just thought it made sense to remove misleading statements about
>> our own code, but not not without warning about related portability
>> issues while there.

> I just thought that if we were going to mention this problem at
> all, then we should do so in strtod() as well.  And since printf(3)
> and scanf(3) suffer from the same issue as wprintf(3) and wscanf(3)
> I see no reason to only include the warning in the wide versions.

Hm, that makes sense.  We can mention cases that we are already
aware of even if we are not sure the list is complete, even if it
is not easy to make it complete.

> I'm not advocating that we try to hunt down all possible affected
> functions.  Actually, I wish there was a single central place we
> could document this where people would see it, but I'm not sure
> there really is one.

I think there is.  The only functions that are really dangerous in
this respect are setlocale(3) and uselocale(3).  As long as you
don't call either of these two, not much can go wrong.

So adding a subsection to the setlocale(3) manual, containing lists
of functions that are likely to break when you call setlocale(3)
with bad arguments on non-OpenBSD systems, might make sense.
Even if we can't guarantee completeness of these lists, they will
give programmers a better idea about the approximate scopes and sizes
of the traps.

Once we have that, very short pointers in individual pages become
sufficient: one short sentence including .Xr setlocale 3 below
CAVEATS.  Less bloat everywhere, better overview in the one critical
place.

Does that make sense?

Yours,
  Ingo

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Todd C. Miller-3
On Sun, 13 Jan 2019 05:20:47 +0100, Ingo Schwarze wrote:

> So adding a subsection to the setlocale(3) manual, containing lists
> of functions that are likely to break when you call setlocale(3)
> with bad arguments on non-OpenBSD systems, might make sense.
> Even if we can't guarantee completeness of these lists, they will
> give programmers a better idea about the approximate scopes and sizes
> of the traps.
>
> Once we have that, very short pointers in individual pages become
> sufficient: one short sentence including .Xr setlocale 3 below
> CAVEATS.  Less bloat everywhere, better overview in the one critical
> place.
>
> Does that make sense?

Sure, that would be similar to how we list signal safe functions
in sigaction(2).

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: LC_NUMERIC in wprintf(3)

Ingo Schwarze
Hi Todd,

Todd C. Miller wrote on Sun, Jan 13, 2019 at 08:06:22AM -0700:
> On Sun, 13 Jan 2019 05:20:47 +0100, Ingo Schwarze wrote:
 

>> So adding a subsection to the setlocale(3) manual, containing lists
>> of functions that are likely to break when you call setlocale(3)
>> with bad arguments on non-OpenBSD systems, might make sense.
>> Even if we can't guarantee completeness of these lists, they will
>> give programmers a better idea about the approximate scopes and sizes
>> of the traps.
>>
>> Once we have that, very short pointers in individual pages become
>> sufficient: one short sentence including .Xr setlocale 3 below
>> CAVEATS.  Less bloat everywhere, better overview in the one critical
>> place.
>>
>> Does that make sense?

> Sure, that would be similar to how we list signal safe functions
> in sigaction(2).

Here is a first draft.
Maybe it is already good enough for commit; it can always be added
to when additional dangers are discovered.

OK?

I'd like to keep the cleanup in the individual pages separate from
the commit to setlocale(3), or i fear the diff might get unwieldy.

Yours,
  Ingo


Index: setlocale.3
===================================================================
RCS file: /cvs/src/lib/libc/locale/setlocale.3,v
retrieving revision 1.22
diff -u -r1.22 setlocale.3
--- setlocale.3 4 Apr 2018 14:57:51 -0000 1.22
+++ setlocale.3 13 Jan 2019 18:02:11 -0000
@@ -78,7 +78,8 @@
 C library.
 A category of
 .Dv LC_ALL
-sets the entire locale generically.
+sets the entire locale generically, which is strongly discouraged for
+security reasons in portable programs.
 .Pp
 The syntax and semantics of the
 .Fa locale
@@ -126,6 +127,29 @@
 The only function in the library that sets the locale is
 .Fn setlocale ;
 the locale is never changed as a side effect of some other routine.
+.Pp
+The
+.Dv LC_CTYPE
+category modifies the behaviour of at least the following functions:
+.Xr iswctype 3 ,
+.Xr mblen 3 ,
+.Xr mbrlen 3 ,
+.Xr mbrtowc 3 ,
+.Xr mbsrtowcs 3 ,
+.Xr mbstowcs 3 ,
+.Xr mbtowc 3 ,
+.Xr towctrans 3 ,
+.Xr towlower 3 ,
+.Xr towupper 3 ,
+.Xr wcrtomb 3 ,
+.Xr wcscasecmp 3 ,
+.Xr wcsrtombs 3 ,
+.Xr wcstombs 3 ,
+.Xr wctomb 3 ,
+.Xr wctrans 3 ,
+.Xr wctype 3 ,
+and the functions documented in
+.Xr iswalnum 3 .
 .Sh RETURN VALUES
 In case of success,
 .Fn setlocale
@@ -205,3 +229,99 @@
 .Fn setlocale
 function first appeared in
 .Bx 4.4 .
+.Sh CAVEATS
+On systems other than
+.Ox ,
+calling
+.Fn setlocale
+or
+.Xr uselocale 3
+with a
+.Fa category
+other than
+.Dv LC_CTYPE
+can cause erratic behaviour of many library functions.
+For security reasons, make sure that portable programs never do that
+but only use
+.Dv LC_CTYPE .
+.Pp
+For example, the following functions may be affected.
+The list is probably incomplete.
+For example, additional library functions may be impacted
+if they directly or indirectly call affected functions,
+or if they attempt to imitate aspects of their behaviour.
+Functions that are not standardized may be affected, too.
+.Bl -tag -width Ds
+.It Dv LC_COLLATE
+.Xr glob 3 ,
+.Xr strcoll 3 ,
+.Xr strxfrm 3 ,
+.Xr wcscoll 3 ,
+.Xr wcsxfrm 3 ,
+and the functions documented in
+.Xr regexec 3
+.It Dv LC_MESSAGES
+.Xr catgets 3 ,
+.Xr catopen 3 ,
+.Xr nl_langinfo 3 ,
+.Xr perror 3 ,
+.Xr psignal 3 ,
+.Xr strerror 3 ,
+.Xr strsignal 3 ,
+and the functions documented in
+.Xr err 3
+.It Dv LC_MONETARY
+.Xr localeconv 3 ,
+.Xr nl_langinfo 3 ,
+.Fn strfmon
+.It Dv LC_NUMERIC
+.Xr atof 3 ,
+.Xr localeconv 3 ,
+.Xr nl_langinfo 3 ,
+.Fn strfmon ,
+and the functions documented in
+.Xr printf 3 ,
+.Xr scanf 3 ,
+.Xr strtod 3 ,
+.Xr wcstod 3 ,
+.Xr wprintf 3 ,
+.Xr wscanf 3 .
+This category is particularly dangerous because it can cause bugs
+in the parsing and formatting of numbers, for example, but not
+limited to, failures to recognize or properly write decimal points.
+.It Dv LC_TIME
+.Fn getdate ,
+.Xr nl_langinfo 3 ,
+.Xr strftime 3 ,
+.Xr strptime 3 .
+Similarly, this is prone to causing bugs in the parsing and formatting
+of date strings.
+.It Dv LC_CTYPE
+On systems other than
+.Ox ,
+this category may affect the behaviour of additional functions,
+for example:
+.Xr btowc 3 ,
+.Xr isalnum 3 ,
+.Xr isalpha 3 ,
+.Xr isblank 3 ,
+.Xr iscntrl 3 ,
+.Xr isdigit 3 ,
+.Xr isgraph 3 ,
+.Xr islower 3 ,
+.Xr isprint 3 ,
+.Xr ispunct 3 ,
+.Xr isspace 3 ,
+.Xr isupper 3 ,
+.Xr isxdigit 3 ,
+.Xr mbsinit 3 ,
+.Xr strcasecmp 3 ,
+.Xr strcoll 3 ,
+.Xr strxfrm 3 ,
+.Xr tolower 3 ,
+.Xr toupper 3 ,
+.Xr vis 3 ,
+.Xr wcscoll 3 ,
+.Xr wcsxfrm 3 ,
+.Xr wctob 3
+.El