I found a sort bug! - How to sort big files?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

I found a sort bug! - How to sort big files?

sort problem
Hello,

----------
# uname -a
OpenBSD notebook.lan 5.6 GENERIC.MP#333 amd64
#
# du -sh small/                                                                                                                                                                                                
663M    small/
# ls -lah small/*.txt | wc -l                                                                                                                                                                                  
      43
#
# cd small
# ulimit -n
10000000
# sysctl | grep -i maxfiles
kern.maxfiles=1000000000
#
# grep open /etc/login.conf                                                                                                                                                                                    
        :openfiles-cur=100000:\
        :openfiles-cur=1280000:\
        :openfiles-cur=512:\
#
# sort -u *.txt -o out
Segmentation fault (core dumped)
#
----------

This is after a minute run.. The txt files have UTF-8 chars too. A line is maximum a few ten chars long in the txt files. All the txt files have UNIX eol's. There is enough storage, enough RAM, enough CPU. I'm even trying this with root user. The txt files are about ~60 000 000 lines.. not a big number... a reboot didn't help.



Any ideas how can I use the "sort" command to actually sort? Please help!



Thanks,

btw, this happens on other UNIX OS too, lol... why do we have the sort command if it doesn't work?

Reply | Threaded
Open this post in threaded view
|

Re: I found a sort bug! - How to sort big files?

Andreas Zeilmeier
On 03/14/15 12:49, sort problem wrote:

> Hello,
>
> ----------
> # uname -a
> OpenBSD notebook.lan 5.6 GENERIC.MP#333 amd64
> #
> # du -sh small/                                                                                                                                                                                                
> 663M    small/
> # ls -lah small/*.txt | wc -l                                                                                                                                                                                  
>       43
> #
> # cd small
> # ulimit -n
> 10000000
> # sysctl | grep -i maxfiles
> kern.maxfiles=1000000000
> #
> # grep open /etc/login.conf                                                                                                                                                                                    
>         :openfiles-cur=100000:\
>         :openfiles-cur=1280000:\
>         :openfiles-cur=512:\
> #
> # sort -u *.txt -o out
> Segmentation fault (core dumped)
> #
> ----------
>
> This is after a minute run.. The txt files have UTF-8 chars too. A line is maximum a few ten chars long in the txt files. All the txt files have UNIX eol's. There is enough storage, enough RAM, enough CPU. I'm even trying this with root user. The txt files are about ~60 000 000 lines.. not a big number... a reboot didn't help.
>
>
>
> Any ideas how can I use the "sort" command to actually sort? Please help!
>
>
>
> Thanks,
>
> btw, this happens on other UNIX OS too, lol... why do we have the sort command if it doesn't work?
>

Hi,

have you tried the option '-H'?
The manpage suggested this for files > 60MB.


Regards,

Andi

Reply | Threaded
Open this post in threaded view
|

Re: I found a sort bug! - How to sort big files?

Stuart Henderson
In reply to this post by sort problem
On 2015-03-14, sort problem <[hidden email]> wrote:
> # sort -u *.txt -o out
> Segmentation fault (core dumped)

There are some known bugs in sort, I ran into a file it couldn't cope with a
couple of years ago too, but it doesn't happen all that often.

I think the consensus was to try and replace it with another version but
not sure what happened.

For your current problem you could pkg_add coreutils and try gsort,
maybe it will cope with your files better..

> btw, this happens on other UNIX OS too, lol... why do we have the sort command if it doesn't work?

Normally it does work.

Reply | Threaded
Open this post in threaded view
|

Re: I found a sort bug! - How to sort big files?

sort problem
In reply to this post by sort problem
o.m.g. It works.

Why doesn't sort uses this by default on files larger then 60 MByte?

Thanks!

-------- Original Message --------
From: Andreas Zeilmeier <[hidden email]>
Apparently from: [hidden email]
To: [hidden email]
Subject: Re: I found a sort bug! - How to sort big files?
Date: Sat, 14 Mar 2015 13:16:05 +0100

> On 03/14/15 12:49, sort problem wrote:
> > Hello,
> >
> > ----------
> > # uname -a
> > OpenBSD notebook.lan 5.6 GENERIC.MP#333 amd64
> > #
> > # du -sh small/                                                                                                                                                                                                
> > 663M    small/
> > # ls -lah small/*.txt | wc -l                                                                                                                                                                                  
> >       43
> > #
> > # cd small
> > # ulimit -n
> > 10000000
> > # sysctl | grep -i maxfiles
> > kern.maxfiles=1000000000
> > #
> > # grep open /etc/login.conf                                                                                                                                                                                    
> >         :openfiles-cur=100000:\
> >         :openfiles-cur=1280000:\
> >         :openfiles-cur=512:\
> > #
> > # sort -u *.txt -o out
> > Segmentation fault (core dumped)
> > #
> > ----------
> >
> > This is after a minute run.. The txt files have UTF-8 chars too. A line is maximum a few ten chars long in the txt files. All the txt files have UNIX eol's. There is enough storage, enough RAM, enough CPU. I'm even trying this with root user. The txt files are about ~60 000 000 lines.. not a big number... a reboot didn't help.
> >
> >
> >
> > Any ideas how can I use the "sort" command to actually sort? Please help!
> >
> >
> >
> > Thanks,
> >
> > btw, this happens on other UNIX OS too, lol... why do we have the sort command if it doesn't work?
> >
>
> Hi,
>
> have you tried the option '-H'?
> The manpage suggested this for files > 60MB.
>
>
> Regards,
>
> Andi

Reply | Threaded
Open this post in threaded view
|

Re: I found a sort bug! - How to sort big files?

Todd C. Miller
In reply to this post by Stuart Henderson
On Sat, 14 Mar 2015 12:29:21 -0000, Stuart Henderson wrote:

> I think the consensus was to try and replace it with another version but
> not sure what happened.

I have a port of the FreeBSD sort but it is slower than our current
sort (and slower than GNU sort).

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: I found a sort bug! - How to sort big files?

Stuart Henderson
On 2015-03-15, Todd C. Miller <[hidden email]> wrote:
> On Sat, 14 Mar 2015 12:29:21 -0000, Stuart Henderson wrote:
>
>> I think the consensus was to try and replace it with another version but
>> not sure what happened.
>
> I have a port of the FreeBSD sort but it is slower than our current
> sort (and slower than GNU sort).

Personally I think that is a reasonable trade-off for more actively
developed code, and when I tried it on some difficult files it coped
better than our current sort (not that this small sample means much
in terms of "ability to handle every difficult file").

Reply | Threaded
Open this post in threaded view
|

Re: I found a sort bug! - How to sort big files?

Otto Moerbeek
On Mon, Mar 16, 2015 at 10:20:07AM +0000, Stuart Henderson wrote:

> On 2015-03-15, Todd C. Miller <[hidden email]> wrote:
> > On Sat, 14 Mar 2015 12:29:21 -0000, Stuart Henderson wrote:
> >
> >> I think the consensus was to try and replace it with another version but
> >> not sure what happened.
> >
> > I have a port of the FreeBSD sort but it is slower than our current
> > sort (and slower than GNU sort).
>
> Personally I think that is a reasonable trade-off for more actively
> developed code, and when I tried it on some difficult files it coped
> better than our current sort (not that this small sample means much
> in terms of "ability to handle every difficult file").

Current sort(1) is unmaintanable in many ways. I say switch.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: I found a sort bug! - How to sort big files?

Paul Stoeber-4
In reply to this post by sort problem
> Current sort(1) is unmaintanable in many ways. I say switch.

I've seen with gdb that the current sort(1) somehow manages to make
radixsort(3) do the work when the sort key is somewhere in the middle
of the line. I don't even want to know... (and my reading
comprehension of C is too weak to go and look). Yes, switch, please!

Reply | Threaded
Open this post in threaded view
|

Re: I found a sort bug! - How to sort big files?

Jan Stary
In reply to this post by Otto Moerbeek
On Mar 16 11:36:08, [hidden email] wrote:

> On Mon, Mar 16, 2015 at 10:20:07AM +0000, Stuart Henderson wrote:
>
> > On 2015-03-15, Todd C. Miller <[hidden email]> wrote:
> > > On Sat, 14 Mar 2015 12:29:21 -0000, Stuart Henderson wrote:
> > >
> > >> I think the consensus was to try and replace it with another version but
> > >> not sure what happened.
> > >
> > > I have a port of the FreeBSD sort but it is slower than our current
> > > sort (and slower than GNU sort).
> >
> > Personally I think that is a reasonable trade-off for more actively
> > developed code, and when I tried it on some difficult files it coped
> > better than our current sort (not that this small sample means much
> > in terms of "ability to handle every difficult file").
>
> Current sort(1) is unmaintanable in many ways. I say switch.

Incidentally, reading up on UNIX history, I came across this:
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6771921