strptime: %e and leading space

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

strptime: %e and leading space

Klemens Nanni-2
strftime(3) has a clear distinction between "%e" and "%d":

        %d    is replaced by the day of the month as a decimal number (01-31).

        %e    is replaced by the day of month as a decimal number (1-31); single
              digits are preceded by a blank.

Whereas strptime(3), being the converse, treats them equally:

        %d    the day of month [1,31]; leading zeros are permitted but not
              required.

        %e    the same as %d.

This makes it impossible to convert back and forth unless one uses the
additional conversion specifier "%n" to parse "any whitespace" before
single digit numbers.


I came across this thanks to a nicely detailed bug report about
mail/isync's `CopyArrivalDate' feature, which uses strptime(3).  Quote:

        The IMAP protocol specifies a date format beginning with a
        day-of-month space-padded to two characters. The %d specifier in
        glibc's strptime(3) will consume a space-padded day of month,
        but OpenBSD's %d only accepts leading zeroes.

That is, formats used in the protocol and/or program such as

        " 4-Mar-2018 16:49:25 -0500" ("%d-%b-%Y %H:%M:%S ")

work with glibc, but not ours, resulting in failures when mbsync tries
to propagate the arrival time together with the messages.


I've checked various implementations with regard to differences between
both conversion specifiers:

glibc %d same as %e[0], both allow multiple leading spaces[1]
musl %d same as %e[2], no spaces allowed[3]
Illumos %d same as %e[4], both allow one leading space
FreeBSD %d other than %e[5], %e allows one leading space[6]

glibc and Illumos sparsely document them with
"The day of the month (1-31)".  musl appearently has no manual and
FreeBSD's strptime(3) refers to strftime(3); there they make the same
distinction we do, but the strptime behaviour in question here can still
only be implied from that:

        %d    is replaced by the day of the month as a decimal number (01-31).

        %e    is replaced by the day of the month as a decimal number (1-31);
              single digits are preceded by a blank.

So Using "%n%d" works in all implementations and looks like a portable
way to address this, but is not exactly the same as _any_ number of
spaces is matched.

Making "%e" in strptime accept blanks added through "%e" in strftime
seems more reasonable to me.  Diff below does that, effectively behaving
like FreeBSD, but also tries to document it explicitly.

Tested on amd64 with the following snippet:

$ cat test.c
#include <time.h>
#include <stdio.h>
int main() {
        struct tm datetime;
        char *fmts[] = { "1", "02", " 3", " 04", "  5", "006" };
        int i;
        for (i = 0; i < sizeof(fmts)/sizeof(fmts[0]); ++i) {
                if (strptime(fmts[i], "%e", &datetime) != NULL)
                        printf("'%s':\t%d\n", fmts[i], datetime.tm_mday);
                else
                        printf("'%s':\tinvalid\n", fmts[i]);
        }
        return 0;
}
$ cc test.c && ./a.out
'1':    1
'02':   2
' 3':   3
' 04':  4
'  5':  invalid
'006':  invalid

regress/lib/libc/time/strptime passes, but is only uses "%d", not "%e",
In case we go this way, I'll happily add more tests.

Feedback is greatly appreciated on this.  Specifically, I'm unsure about
potential downsides or pitfalls this can bring, although other
implementations seem to do fine.

Could changing the existing behaviour of "%e" break certain use cases?
Are "%d" and "%e" are deliberately kept the same? If so, why?

0: https://git.launchpad.net/glibc/tree/time/strptime_l.c#n74
1: https://git.launchpad.net/glibc/tree/time/strptime_l.c#n518

2: https://git.musl-libc.org/cgit/musl/tree/src/time/strptime.c#n53
3: https://git.musl-libc.org/cgit/musl/tree/src/time/strptime.c#n162

4: https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/locale/strptime.c#L331

5: https://svnweb.freebsd.org/base/head/lib/libc/stdtime/strptime.c?revision=340106&view=markup#l401
6: https://svnweb.freebsd.org/base/head/lib/libc/stdtime/strptime.c?revision=340106&view=markup#l420

Index: lib/libc/time/strptime.3
===================================================================
RCS file: /cvs/src/lib/libc/time/strptime.3,v
retrieving revision 1.26
diff -u -p -r1.26 strptime.3
--- lib/libc/time/strptime.3 21 Jan 2019 21:35:58 -0000 1.26
+++ lib/libc/time/strptime.3 17 Feb 2019 21:49:23 -0000
@@ -96,8 +96,8 @@ leading zeros are permitted but not requ
 .It Cm \&%D
 the date as %m/%d/%y.
 .It Cm \&%e
-the same as
-.Cm \&%d .
+the day of month [1,31];
+leading spaces or zeros are permitted but not required.
 .It Cm \&%F
 the date as %Y-%m-%d
 (the
Index: lib/libc/time/strptime.c
===================================================================
RCS file: /cvs/src/lib/libc/time/strptime.c,v
retrieving revision 1.24
diff -u -p -r1.24 strptime.c
--- lib/libc/time/strptime.c 22 Jan 2019 11:09:03 -0000 1.24
+++ lib/libc/time/strptime.c 17 Feb 2019 20:26:23 -0000
@@ -254,8 +254,11 @@ literal:
  century = i * 100;
  break;
 
- case 'd': /* The day of month. */
- case 'e':
+ case 'e': /* The day of month. */
+ if (isspace(*bp))
+ bp++;
+ /* FALLTHROUGH */
+ case 'd':
  _LEGAL_ALT(_ALT_O);
  if (!(_conv_num(&bp, &tm->tm_mday, 1, 31)))
  return (NULL);

Reply | Threaded
Open this post in threaded view
|

Re: strptime: %e and leading space

Ted Unangst-6
Klemens Nanni wrote:
> I came across this thanks to a nicely detailed bug report about
> mail/isync's `CopyArrivalDate' feature, which uses strptime(3).  Quote:
>
> The IMAP protocol specifies a date format beginning with a
> day-of-month space-padded to two characters. The %d specifier in
> glibc's strptime(3) will consume a space-padded day of month,
> but OpenBSD's %d only accepts leading zeroes.

> I've checked various implementations with regard to differences between
> both conversion specifiers:
>
> glibc %d same as %e[0], both allow multiple leading spaces[1]
> musl %d same as %e[2], no spaces allowed[3]
> Illumos %d same as %e[4], both allow one leading space
> FreeBSD %d other than %e[5], %e allows one leading space[6]

There is also posix.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/strptime.html

d
The day of the month [01,31]; leading zeros shall be permitted but shall not
be required.
e
Equivalent to %d.

However, later it adds:
Several "equivalent to" formats and the special processing of white-space
characters are provided in order to ease the use of identical format strings
for strftime() and strptime().

I think this sounds like a reasonable change, but austin bugs may be another
place to bring it up. They maybe intended for %e to consume a space, but my
reading of the text is that they forgot to actually say so.

I do believe it should be possible to round trip data through these functions
with the same format string.

Reply | Threaded
Open this post in threaded view
|

Re: strptime: %e and leading space

Theo de Raadt-2
Ted Unangst <[hidden email]> wrote:

>
> I do believe it should be possible to round trip data through these functions
> with the same format string.
>

On that basis alone, I concur.

Reply | Threaded
Open this post in threaded view
|

Re: strptime: %e and leading space

Klemens Nanni-2
In reply to this post by Ted Unangst-6
On Sun, Feb 17, 2019 at 08:45:18PM -0500, Ted Unangst wrote:
> However, later it adds:
> Several "equivalent to" formats and the special processing of white-space
> characters are provided in order to ease the use of identical format strings
> for strftime() and strptime().
Well, that sounds like just enough to support what I'm heading for.

> I think this sounds like a reasonable change, but austin bugs may be another
> place to bring it up. They maybe intended for %e to consume a space, but my
> reading of the text is that they forgot to actually say so.
I'm not that familiar with standards compliance; if appropiate, maybe
someone with an account on http://austingroupbugs.net/main_page.php can
help me here?

> I do believe it should be possible to round trip data through these functions
> with the same format string.
Agreed.

Looking at postfix's source to see how the string net/isync consumes is
generated, the following seems to justify my diff even further:

        $ grep -FC5 MISSING_STRFTIME_E postfix-3.3.2/global/mail_date.c
            /*
             * First, format the date and wall-clock time. XXX The %e format (day of
             * month, leading zero replaced by blank) isn't in my POSIX book, but
             * many vendors seem to support it.
             */
        #ifdef MISSING_STRFTIME_E
        #define STRFTIME_FMT "%a, %d %b %Y %H:%M:%S "
        #else
        #define STRFTIME_FMT "%a, %e %b %Y %H:%M:%S "
        #endif

Given this feedback and the fact that the majority of implementations
already behave that way:  any OKs or objections for my diff?

Reply | Threaded
Open this post in threaded view
|

Re: strptime: %e and leading space

Todd C. Miller-3
On Tue, 19 Feb 2019 23:48:03 +0100, Klemens Nanni wrote:

> Given this feedback and the fact that the majority of implementations
> already behave that way:  any OKs or objections for my diff?

OK millert@

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: strptime: %e and leading space

Jeremie Courreges-Anglas-2
On Wed, Feb 20 2019, "Todd C. Miller" <[hidden email]> wrote:
> On Tue, 19 Feb 2019 23:48:03 +0100, Klemens Nanni wrote:
>
>> Given this feedback and the fact that the majority of implementations
>> already behave that way:  any OKs or objections for my diff?
>
> OK millert@

Also ok jca@

--
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE