sort: don't do top level comparison when invoked with -c

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

sort: don't do top level comparison when invoked with -c

Richard Ipsum-2
Hi,

I found a bug in OpenBSD's sort utility, related to a previous bug I found.[1]
The fix I provided for that bug excluded the top level comparison when -k
was in use. Recently I discovered that there are other cases where OpenBSD's
sort does not produce the correct results. I've appended these test cases
below:

Given an input file "input.txt", containing:

-aaaa
-AAAA
+bbbb
+BBBB
=cccc
=CCCC

for each of the following invocations sort should return 0, but OpenBSD's
sort currently reports disorder:

OpenBSD:
% sort -c -d -f input.txt
sort: input.txt:2: disorder: -AAAA

% sort -c -f -i input.txt
sort: input.txt:2: disorder: -AAAA

GNU:
% sort -c -d -f input.txt
% echo $?
0

% sort -c -f -i input.txt
% echo $?
0

After thinking about it for a while I don't see why the second top level
comparison is needed at all for -c, a top-level comparison is needed
in the general case to act as a "tiebreaker" for lines whose keys
compare equal, but for -c no such tiebreaker is needed since we're not
really sorting the input, just detecting disorder.

The following patch seems to fix the issue without causing any regressions
as far as I am able to determine.

Thanks,
Richard

[1]: https://marc.info/?l=openbsd-tech&m=157755445524793&w=2

diff --git usr.bin/sort/file.c usr.bin/sort/file.c
index d3b97f5b2df..a803fc71fec 100644
--- usr.bin/sort/file.c
+++ usr.bin/sort/file.c
@@ -384,17 +384,7 @@ check(const char *fn)
  }
  int cmp = key_coll(ka2, ka1, 0);
  if (debug_sort)
- printf("; cmp1=%d", cmp);
-
- if (!cmp && sort_opts_vals.complex_sort &&
-    !(sort_opts_vals.uflag) && !(sort_opts_vals.sflag) &&
-    !(sort_opts_vals.kflag)) {
- cmp = top_level_str_coll(s2, s1);
- if (debug_sort)
- printf("; cmp2=%d", cmp);
- }
- if (debug_sort)
- printf("\n");
+ printf("; cmp1=%d\n", cmp);
 
  if ((sort_opts_vals.uflag && (cmp <= 0)) || (cmp < 0)) {
  if (!(sort_opts_vals.csilentflag)) {

Reply | Threaded
Open this post in threaded view
|

Re: sort: don't do top level comparison when invoked with -c

Todd C. Miller-3
GNU sort on Linux behaves the same as the OpenBSD sort when run in
the C locale.

$ LANG=C sort -c -d -f input.txt
sort: input.txt:2: disorder: -AAAA

$ LANG=C sort -c -d -i input.txt
sort: input.txt:2: disorder: -AAAA

Since our C library doesn't really support other locales I think
this is the expected behavior.

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: sort: don't do top level comparison when invoked with -c

Richard Ipsum-2
On Mon, Mar 23, 2020 at 09:41:16AM -0600, Todd C. Miller wrote:

> GNU sort on Linux behaves the same as the OpenBSD sort when run in
> the C locale.
>
> $ LANG=C sort -c -d -f input.txt
> sort: input.txt:2: disorder: -AAAA
>
> $ LANG=C sort -c -d -i input.txt
> sort: input.txt:2: disorder: -AAAA
>
> Since our C library doesn't really support other locales I think
> this is the expected behavior.
>
>  - todd

It didn't occur to me to try this with the C locale.
For what it's worth I asked on the coreutils list,
where it's been suggested that the top-level sort could be dropped
for locales that define a total ordering of all characters,
which (I think) would include the C locale.[1]

Thanks,
Richard

[1]: https://www.mail-archive.com/bug-coreutils@.../msg31342.html