join(1) on NULL values

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

join(1) on NULL values

Martijn van Duren-5
When reviewing otto@'s diff I found the following, which was also part  
of the r1.4 import. I managed to contact michaels and he told me that he
couldn't recollect anything other then what was in the commit message
and other local changes were most likely not intentional.

I tested this behaviour with FreeBSD, NetBSD, gjoin and with csrg join
prior to the current structure and the firs 3 all include a link on join
and the latter segfaults if the join field is not available.

POSIX doesn't mention how the code should work if a particular field is
not available, so we get no guidance from there.

Personally I think our current behaviour is more logical, most likely
because I'm used to SQL joins, but considering no one else does it it
might be wise to get back into the fold.

Note that this still is in line with fixing PR-1356 and the other BSDs
need to swap the return 1 and return -1 case to make things work on
their versions of join.

$ join -j 2 -e empty /tmp/z1 /tmp/z2
d b c
$ ./obj/join -j 2 -e empty /tmp/z1 /tmp/z2
empty a a
d b c

thoughts?

martijn@

Index: join.c
===================================================================
RCS file: /cvs/src/usr.bin/join/join.c,v
retrieving revision 1.29
diff -u -p -r1.29 join.c
--- join.c 18 Oct 2018 09:36:48 -0000 1.29
+++ join.c 18 Oct 2018 14:43:10 -0000
@@ -362,10 +362,10 @@ int
 cmp(LINE *lp1, u_long fieldno1, LINE *lp2, u_long fieldno2)
 {
  if (lp1->fieldcnt <= fieldno1)
- return (-1);
- else if (lp2->fieldcnt <= fieldno2)
- return (1);
- return (strcmp(lp1->fields[fieldno1], lp2->fields[fieldno2]));
+ return lp2->fieldcnt <= fieldno2 ? 0 : -1;
+ if (lp2->fieldcnt <= fieldno2)
+ return 1;
+ return strcmp(lp1->fields[fieldno1], lp2->fields[fieldno2]);
 }
 
 void

Reply | Threaded
Open this post in threaded view
|

Re: join(1) on NULL values

Martijn van Duren-5
On 10/18/18 16:51, Martijn van Duren wrote:

> When reviewing otto@'s diff I found the following, which was also part  
> of the r1.4 import. I managed to contact michaels and he told me that he
> couldn't recollect anything other then what was in the commit message
> and other local changes were most likely not intentional.
>
> I tested this behaviour with FreeBSD, NetBSD, gjoin and with csrg join
> prior to the current structure and the firs 3 all include a link on join
> and the latter segfaults if the join field is not available.
>
> POSIX doesn't mention how the code should work if a particular field is
> not available, so we get no guidance from there.
>
> Personally I think our current behaviour is more logical, most likely
> because I'm used to SQL joins, but considering no one else does it it
> might be wise to get back into the fold.
>
> Note that this still is in line with fixing PR-1356 and the other BSDs
> need to swap the return 1 and return -1 case to make things work on
> their versions of join.

And for the kids playing along at home, this works better with the
test files included:
$ cat /tmp/z1
a
b d
$ cat /tmp/z2
a
c d

>
> $ join -j 2 -e empty /tmp/z1 /tmp/z2
> d b c
> $ ./obj/join -j 2 -e empty /tmp/z1 /tmp/z2
> empty a a
> d b c
>
> thoughts?
>
> martijn@
>
> Index: join.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/join/join.c,v
> retrieving revision 1.29
> diff -u -p -r1.29 join.c
> --- join.c 18 Oct 2018 09:36:48 -0000 1.29
> +++ join.c 18 Oct 2018 14:43:10 -0000
> @@ -362,10 +362,10 @@ int
>  cmp(LINE *lp1, u_long fieldno1, LINE *lp2, u_long fieldno2)
>  {
>   if (lp1->fieldcnt <= fieldno1)
> - return (-1);
> - else if (lp2->fieldcnt <= fieldno2)
> - return (1);
> - return (strcmp(lp1->fields[fieldno1], lp2->fields[fieldno2]));
> + return lp2->fieldcnt <= fieldno2 ? 0 : -1;
> + if (lp2->fieldcnt <= fieldno2)
> + return 1;
> + return strcmp(lp1->fields[fieldno1], lp2->fields[fieldno2]);
>  }
>  
>  void
>

Reply | Threaded
Open this post in threaded view
|

Re: join(1) on NULL values

Otto Moerbeek
On Thu, Oct 18, 2018 at 04:57:23PM +0200, Martijn van Duren wrote:

> On 10/18/18 16:51, Martijn van Duren wrote:
> > When reviewing otto@'s diff I found the following, which was also part  
> > of the r1.4 import. I managed to contact michaels and he told me that he
> > couldn't recollect anything other then what was in the commit message
> > and other local changes were most likely not intentional.
> >
> > I tested this behaviour with FreeBSD, NetBSD, gjoin and with csrg join
> > prior to the current structure and the firs 3 all include a link on join
> > and the latter segfaults if the join field is not available.
> >
> > POSIX doesn't mention how the code should work if a particular field is
> > not available, so we get no guidance from there.
> >
> > Personally I think our current behaviour is more logical, most likely
> > because I'm used to SQL joins, but considering no one else does it it
> > might be wise to get back into the fold.
> >
> > Note that this still is in line with fixing PR-1356 and the other BSDs
> > need to swap the return 1 and return -1 case to make things work on
> > their versions of join.
>
> And for the kids playing along at home, this works better with the
> test files included:
> $ cat /tmp/z1
> a
> b d
> $ cat /tmp/z2
> a
> c d
> >
> > $ join -j 2 -e empty /tmp/z1 /tmp/z2
> > d b c
> > $ ./obj/join -j 2 -e empty /tmp/z1 /tmp/z2
> > empty a a
> > d b c
> >
> > thoughts?

I think we want this. Don't forget to update the regress test and keep
an eye open for regressions in real-life usage.

        -Otto


> >
> > martijn@
> >
> > Index: join.c
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/join/join.c,v
> > retrieving revision 1.29
> > diff -u -p -r1.29 join.c
> > --- join.c 18 Oct 2018 09:36:48 -0000 1.29
> > +++ join.c 18 Oct 2018 14:43:10 -0000
> > @@ -362,10 +362,10 @@ int
> >  cmp(LINE *lp1, u_long fieldno1, LINE *lp2, u_long fieldno2)
> >  {
> >   if (lp1->fieldcnt <= fieldno1)
> > - return (-1);
> > - else if (lp2->fieldcnt <= fieldno2)
> > - return (1);
> > - return (strcmp(lp1->fields[fieldno1], lp2->fields[fieldno2]));
> > + return lp2->fieldcnt <= fieldno2 ? 0 : -1;
> > + if (lp2->fieldcnt <= fieldno2)
> > + return 1;
> > + return strcmp(lp1->fields[fieldno1], lp2->fields[fieldno2]);
> >  }
> >  
> >  void
> >