bug report: ksh: incorrect "$@" handling with null (empty) IFS

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

bug report: ksh: incorrect "$@" handling with null (empty) IFS

avih
Note: originally posted to misc@ and was requested by Rafael Possamai
to post to bugs@ ( https://marc.info/?l=openbsd-misc&m=159986950219082&w=4 )

Note: I'm not following this list, nor using OpenBSD regularly.
Please CC me for any comment.


uname -a (running inside virtual box):
OpenBSD foo.my.domain 6.7 GENERIC.MP#182 amd64


Run the following command:
/bin/sh -c 'IFS=; args() { printf "[%s]" "$@"; }; args 1 2 3'

Expected result:
[1][2][3]

Actual result:
[1 2 3]


It seems that "$@" in [k]sh with IFS as null combines the arguments
as if it was "$*" when IFS is unset or begins with space, while in
fact the main difference between (quoted) "$@" and $* is that the
former should retain the individual arguments regardless of IFS,
while $* combines them with the first char of IFS (or space if IFS
is unset).

The same issue exists also at pdksh 5.2.14, but not at any other
shell which I tested (dash 5.8 up to master, busybox-ash, bash, ksh93,
mksh, bosh, and others).

As far as I can interpret the POSIX specifications and as far as other
shells which I tested behave, it seems to be a bug in pdksh which is
inherited by OpenBSD ksh.


As an anecdote why this behavior is undesirable, consider the pattern:
while IFS= read -r line; do ...; done

and imagine that `read' was implemented as a function which then
calls another function, e.g.: read() { read_internal "$@"; }

With the current behavior, read_internal will see one argument
"-r line" instead of two: "-r" and "line", and could never work
correctly.

Example of this flow (ignore the general badness of echo):
sh -c 'foo() { for x; do echo [$x]; done; }; bar() { foo "$@"; }; IFS=; bar -r line'

prints [-r line] as one argument instead of [-r]<\n>[line]<\n>


Best regards,
Avi Halachmi


I was requested to attach dmesg, so here it is:
-----------------------------------------------

OpenBSD 6.7 (GENERIC.MP) #182: Thu May  7 11:11:58 MDT 2020
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4278124544 (4079MB)
avail mem = 4135845888 (3944MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xe1000 (10 entries)
bios0: vendor innotek GmbH version "VirtualBox" date 12/01/2006
bios0: innotek GmbH VirtualBox
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP APIC SSDT
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 2208.44 MHz, 06-9e-0a
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,RDRAND,NXE,RDTSCP,LONG,LAHF,ABM,3DNOWP,ITSC,FSGSBASE,AVX2,INVPCID,RDSEED,CLFLUSHOPT,MD_CLEAR,L1DF,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: CPU supports MTRRs but not enabled by BIOS
cpu0: apic clock running at 1000MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 2208.44 MHz, 06-9e-0a
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,RDRAND,NXE,RDTSCP,LONG,LAHF,ABM,3DNOWP,ITSC,FSGSBASE,AVX2,INVPCID,RDSEED,CLFLUSHOPT,MD_CLEAR,L1DF,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 2208.67 MHz, 06-9e-0a
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,RDRAND,NXE,RDTSCP,LONG,LAHF,ABM,3DNOWP,ITSC,FSGSBASE,AVX2,INVPCID,RDSEED,CLFLUSHOPT,MD_CLEAR,L1DF,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 2209.88 MHz, 06-9e-0a
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,RDRAND,NXE,RDTSCP,LONG,LAHF,ABM,3DNOWP,ITSC,FSGSBASE,AVX2,INVPCID,RDSEED,CLFLUSHOPT,MD_CLEAR,L1DF,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 0, core 3, package 0
cpu4 at mainbus0: apid 4 (application processor)
cpu4: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 2209.71 MHz, 06-9e-0a
cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,RDRAND,NXE,RDTSCP,LONG,LAHF,ABM,3DNOWP,ITSC,FSGSBASE,AVX2,INVPCID,RDSEED,CLFLUSHOPT,MD_CLEAR,L1DF,MELTDOWN
cpu4: 256KB 64b/line 8-way L2 cache
cpu4: smt 0, core 0, package 1
cpu5 at mainbus0: apid 5 (application processor)
cpu5: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 2218.69 MHz, 06-9e-0a
cpu5: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,RDRAND,NXE,RDTSCP,LONG,LAHF,ABM,3DNOWP,ITSC,FSGSBASE,AVX2,INVPCID,RDSEED,CLFLUSHOPT,MD_CLEAR,L1DF,MELTDOWN
cpu5: 256KB 64b/line 8-way L2 cache
cpu5: smt 0, core 1, package 1
ioapic0 at mainbus0: apid 6 pa 0xfec00000, version 20, 24 pins, remapped
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpicpu2 at acpi0: C1(@1 halt!)
acpicpu3 at acpi0: C1(@1 halt!)
acpicpu4 at acpi0: C1(@1 halt!)
acpicpu5 at acpi0: C1(@1 halt!)
acpipci0 at acpi0 PCI0: 0x00000000 0x00000011 0x00000001
acpibat0 at acpi0: BAT0 model "1" serial 0 type VBOX oem "innotek"
acpiac0 at acpi0: AC unit online
acpivideo0 at acpi0: GFX0
cpu0: using VERW MDS workaround (except on vmm entry)
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371AB IDE" rev 0x01: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
wd0 at pciide0 channel 0 drive 0: <VBOX HARDDISK>
wd0: 128-sector PIO, LBA, 16384MB, 33554432 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
atapiscsi0 at pciide0 channel 1 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <VBOX, CD-ROM, 1.0> removable
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
vga1 at pci0 dev 2 function 0 "InnoTek VirtualBox Graphics Adapter" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
em0 at pci0 dev 3 function 0 "Intel 82540EM" rev 0x02: apic 6 int 19, address 08:00:27:54:b0:69
"InnoTek VirtualBox Guest Service" rev 0x00 at pci0 dev 4 function 0 not configured
auich0 at pci0 dev 5 function 0 "Intel 82801AA AC97" rev 0x01: apic 6 int 21, ICH
ac97: codec id 0x83847600 (SigmaTel STAC9700)
audio0 at auich0
ohci0 at pci0 dev 6 function 0 "Apple Intrepid USB" rev 0x00: apic 6 int 22, version 1.0
piixpm0 at pci0 dev 7 function 0 "Intel 82371AB Power" rev 0x08: apic 6 int 23
iic0 at piixpm0
ehci0 at pci0 dev 11 function 0 "Intel 82801FB USB" rev 0x00: apic 6 int 19
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 configuration 1 interface 0 "Apple OHCI root hub" rev 1.00/1.00 addr 1
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on wd0a (4ef32fd6459c69c7.a) swap on wd0b dump on wd0b
WARNING: clock gained 4 days
WARNING: CHECK AND RESET THE DATE!

Reply | Threaded
Open this post in threaded view
|

Re: bug report: ksh: incorrect "$@" handling with null (empty) IFS

Theo Buehler-3
On Sat, Sep 12, 2020 at 06:58:47AM +0000, avih wrote:

> Note: originally posted to misc@ and was requested by Rafael Possamai
> to post to bugs@ ( https://marc.info/?l=openbsd-misc&m=159986950219082&w=4 )
>
> Note: I'm not following this list, nor using OpenBSD regularly.
> Please CC me for any comment.
>
>
> uname -a (running inside virtual box):
> OpenBSD foo.my.domain 6.7 GENERIC.MP#182 amd64
>
>
> Run the following command:
> /bin/sh -c 'IFS=; args() { printf "[%s]" "$@"; }; args 1 2 3'
>
> Expected result:
> [1][2][3]
>
> Actual result:
> [1 2 3]
>
>
> It seems that "$@" in [k]sh with IFS as null combines the arguments
> as if it was "$*" when IFS is unset or begins with space, while in
> fact the main difference between (quoted) "$@" and $* is that the
> former should retain the individual arguments regardless of IFS,
> while $* combines them with the first char of IFS (or space if IFS
> is unset).

Thanks for the report.  This is known and a patch by Martijn Dekker from
2016 exists, but it never made it into the tree despite being tested in
snapshots for a while. I don't know/remember the reason - I think it
just stalled.

Below is the final version of the patch, rebased against -current.

Here's Martijn's original email
(https://marc.info/?l=openbsd-tech&m=145704771329065&w=2)

> Hi all,
>
> I'm new here and posting at Theo de Raadt's request. I'm developing a
> general-purpose cross-platform library for the POSIX shell language and
> in the process I encounter lots of bugs in various shells. I will be
> posting here a few times with some patches and bug reports against
> OpenBSD ksh. Here is the first.
>
> One uncommon but useful way of writing shell scripts is to start off by
> disabling field/word splitting (IFS='') and pathname expansion/globbing
> (set -f), re-enabling either or both only for the commands that need
> them, e.g. within a subshell. This helps avoid a lot of snags with field
> splitting and globbing if you forget to quote a variable somewhere,
> adding to the general robustness of a script. (In fact it eliminates
> much of the need to quote variable/parameter expansions, with empty
> removal remaining as the only issue.)
>
> Unfortunately OpenBSD ksh (like all pdksh variants except mksh) has a
> POSIX compliance bug that is a show stopper for this approach: "$@" does
> not generate words (arguments) if IFS is empty. As a result, the
> separate command arguments represented by "$@" become a single argument.
> So passing on an intact set of positional parameters to a command or
> function is impossible with field splitting disabled.
>
> Of course this is illogical: the quoted special parameter "$@" generates
> zero or more words, it doesn't split any words, so the contents of IFS
> (or lack thereof) should be neither here nor there. It's old ksh88
> behaviour copied by the original pdksh, but it violates POSIX and it has
> been fixed many years ago in ksh93 and all other POSIX shells.
>
> As I mentioned, mksh fixed it too. I seem to have successfully
> backported the mksh fix to OpenBSD ksh. The fix is not too complicated,
> but not trivial either, so the MirOS licence would have applied, which I
> understand may not be acceptable here.
>
> So I emailed the author, Thorsten Glaser, explaining the situation.
> After some amicable discussion, he granted me a personal licence for
> this particular patch, authorising me to sublicence it in whatever way I
> please. (Proof is available on request.)
>
> Under that authorisation, I hereby dedicate the attached patch to the
> public domain. In localities where that is not valid, I hereby grant
> unlimited permission to use, copy, modify and distribute it, to the
> extent permitted by law.
>
> So this patch makes quoted "$@" act according to the standard even when
> IFS is empty. Quoted "$*" is unchanged. For the unspecified (not
> standardised) cases of unquoted $@ and $*, this patch makes ksh act like
> AT&T ksh93, bash, zsh and (d)ash, which seems safest from a
> compatibility point of view.
>
> The patch is against the OpenBSD 5.8-RELEASE sources. I hope that's ok.
> I also added an update for the relevant regression test.
>
> Thanks,
>
> - M.


Index: bin/ksh/eval.c
===================================================================
RCS file: /var/cvs/src/bin/ksh/eval.c,v
retrieving revision 1.65
diff -u -p -r1.65 eval.c
--- bin/ksh/eval.c 28 Jun 2019 13:34:59 -0000 1.65
+++ bin/ksh/eval.c 12 Sep 2020 08:29:56 -0000
@@ -47,6 +47,8 @@ typedef struct Expand {
 #define IFS_WORD 0 /* word has chars (or quotes) */
 #define IFS_WS 1 /* have seen IFS white-space */
 #define IFS_NWS 2 /* have seen IFS non-white-space */
+#define IFS_IWS 3 /* beginning of word, ignore IFS white-space */
+#define IFS_QUOTE 4 /* beg.w/quote, becomes IFS_WORD unless "$@" */
 
 static int varsub(Expand *, char *, char *, int *, int *);
 static int comsub(Expand *, char *);
@@ -217,7 +219,17 @@ expand(char *cp, /* input word */
  c = *sp++;
  break;
  case OQUOTE:
- word = IFS_WORD;
+ switch (word) {
+ case IFS_QUOTE:
+ /* """something */
+ word = IFS_WORD;
+ break;
+ case IFS_WORD:
+ break;
+ default:
+ word = IFS_QUOTE;
+ break;
+ }
  tilde_ok = 0;
  quote = 1;
  continue;
@@ -297,6 +309,8 @@ expand(char *cp, /* input word */
  if (f&DOBLANK)
  doblank++;
  tilde_ok = 0;
+ if (word == IFS_QUOTE && type != XNULLSUB)
+ word = IFS_WORD;
  if (type == XBASE) { /* expand? */
  if (!st->next) {
  SubType *newst;
@@ -358,6 +372,11 @@ expand(char *cp, /* input word */
  f |= DOTEMP_;
  /* FALLTHROUGH */
  default:
+ /* '-' '+' '?' */
+ if (quote)
+ word = IFS_WORD;
+ else if (dp == Xstring(ds, dp))
+ word = IFS_IWS;
  /* Enable tilde expansion */
  tilde_ok = 1;
  f |= DOTILDE;
@@ -387,10 +406,17 @@ expand(char *cp, /* input word */
  */
  x.str = trimsub(str_val(st->var),
  dp, st->stype);
- if (x.str[0] != '\0' || st->quote)
+ if (x.str[0] != '\0') {
+ word = IFS_IWS;
  type = XSUB;
- else
+ } else if (quote) {
+ word = IFS_WORD;
+ type = XSUB;
+ } else {
+ if (dp == Xstring(ds, dp))
+ word = IFS_IWS;
  type = XNULLSUB;
+ }
  if (f&DOBLANK)
  doblank++;
  st = st->prev;
@@ -422,6 +448,10 @@ expand(char *cp, /* input word */
  if (f&DOBLANK)
  doblank++;
  st = st->prev;
+ if (quote || !*x.str)
+ word = IFS_WORD;
+ else
+ word = IFS_IWS;
  continue;
  case '?':
     {
@@ -463,12 +493,8 @@ expand(char *cp, /* input word */
  type = XBASE;
  if (f&DOBLANK) {
  doblank--;
- /* not really correct: x=; "$x$@" should
- * generate a null argument and
- * set A; "${@:+}" shouldn't.
- */
- if (dp == Xstring(ds, dp))
- word = IFS_WS;
+ if (dp == Xstring(ds, dp) && word != IFS_WORD)
+ word = IFS_IWS;
  }
  continue;
 
@@ -503,7 +529,12 @@ expand(char *cp, /* input word */
  if (c == 0) {
  if (quote && !x.split)
  continue;
+ if (!quote && word == IFS_WS)
+ continue;
+ /* this is so we don't terminate */
  c = ' ';
+ /* now force-emit a word */
+ goto emit_word;
  }
  if (quote && x.split) {
  /* terminate word for "$@" */
@@ -554,15 +585,15 @@ expand(char *cp, /* input word */
  * -----------------------------------
  * IFS_WORD w/WS w/NWS w
  * IFS_WS -/WS w/NWS -
- * IFS_NWS -/NWS w/NWS w
+ * IFS_NWS -/NWS w/NWS -
+ * IFS_IWS -/WS w/NWS -
  *   (w means generate a word)
- * Note that IFS_NWS/0 generates a word (at&t ksh
- * doesn't do this, but POSIX does).
  */
- if (word == IFS_WORD ||
-    (!ctype(c, C_IFSWS) && c && word == IFS_NWS)) {
- char *p;
-
+ if ((word == IFS_WORD) || (word == IFS_QUOTE) || (c &&
+    (word == IFS_IWS || word == IFS_NWS) &&
+    !ctype(c, C_IFSWS))) {
+ char *p;
+ emit_word:
  *dp++ = '\0';
  p = Xclose(ds, dp);
  if (fdo & DOBRACE_)
Index: regress/bin/ksh/ifs.t
===================================================================
RCS file: /var/cvs/src/regress/bin/ksh/ifs.t,v
retrieving revision 1.1
diff -u -p -r1.1 ifs.t
--- regress/bin/ksh/ifs.t 2 Dec 2013 20:39:44 -0000 1.1
+++ regress/bin/ksh/ifs.t 12 Sep 2020 08:29:56 -0000
@@ -45,10 +45,10 @@ stdin:
  showargs 3 $@
  showargs 4 "$@"
 expected-stdout:
- <1> <A B C>
+ <1> <A> <B> <C>
  <2> <ABC>
- <3> <A B C>
- <4> <A B C>
+ <3> <A> <B> <C>
+ <4> <A> <B> <C>
 ---
 
 name: IFS-space-colon-1

Reply | Threaded
Open this post in threaded view
|

Re: bug report: ksh: incorrect "$@" handling with null (empty) IFS

avih
 

> On Saturday, September 12, 2020, 11:39:57 AM GMT+3, Theo Buehler <[hidden email]> wrote:
>
>
> On Sat, Sep 12, 2020 at 06:58:47AM +0000, avih wrote:
> > Run the following command:
> > /bin/sh -c 'IFS=; args() { printf "[%s]" "$@"; }; args 1 2 3'
> >
> > Expected result:
> > [1][2][3]
> >
> > Actual result:
> > [1 2 3]
> >
>
> Thanks for the report. This is known and a patch by Martijn Dekker from
> 2016 exists, but it never made it into the tree despite being tested in
> snapshots for a while. I don't know/remember the reason - I think it
> just stalled.


Thanks for the quick and on-topic reply. I guess the ball is at your
court now.

Cheers,
- avih