locate.mklocatedb broken with LC_ALL!=C

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

locate.mklocatedb broken with LC_ALL!=C

Giovanni Bechis-7
Hi,
after setting LC_ALL=en_US.UTF-8 on my env locate.mklocatedb seems broken,
resetting LC_ALL=C is a workaround.

$ export LC_ALL=en_US.UTF-8
$ doas /usr/libexec/locate.updatedb
sort: Illegal byte sequence
locate.mklocatedb: cannot build locate database
$ export LC_ALL=C
$ doas /usr/libexec/locate.updatedb

Should we run weekly(8) with LC_ALL=C to be sure that locate.updatedb runs correctly ?

 Cheers
  Giovanni

Reply | Threaded
Open this post in threaded view
|

Re: locate.mklocatedb broken with LC_ALL!=C

Stefan Sperling-5
On Sun, Oct 07, 2018 at 09:43:05AM +0200, Giovanni Bechis wrote:

> Hi,
> after setting LC_ALL=en_US.UTF-8 on my env locate.mklocatedb seems broken,
> resetting LC_ALL=C is a workaround.
>
> $ export LC_ALL=en_US.UTF-8
> $ doas /usr/libexec/locate.updatedb
> sort: Illegal byte sequence
> locate.mklocatedb: cannot build locate database
> $ export LC_ALL=C
> $ doas /usr/libexec/locate.updatedb
>
> Should we run weekly(8) with LC_ALL=C to be sure that locate.updatedb runs correctly ?

Where did you set the LC_ALL variable? The UTF-8 locale should only be enabled
on a per-user basis (e.g. in ~/.profile), not for the entire system.
There are many programs in the base system which don't [yet] support UTF-8.

Reply | Threaded
Open this post in threaded view
|

Re: locate.mklocatedb broken with LC_ALL!=C

Giovanni Bechis-7
On Sun, Oct 07, 2018 at 10:06:35AM +0200, Stefan Sperling wrote:

> On Sun, Oct 07, 2018 at 09:43:05AM +0200, Giovanni Bechis wrote:
> > Hi,
> > after setting LC_ALL=en_US.UTF-8 on my env locate.mklocatedb seems broken,
> > resetting LC_ALL=C is a workaround.
> >
> > $ export LC_ALL=en_US.UTF-8
> > $ doas /usr/libexec/locate.updatedb
> > sort: Illegal byte sequence
> > locate.mklocatedb: cannot build locate database
> > $ export LC_ALL=C
> > $ doas /usr/libexec/locate.updatedb
> >
> > Should we run weekly(8) with LC_ALL=C to be sure that locate.updatedb runs correctly ?
>
> Where did you set the LC_ALL variable? The UTF-8 locale should only be enabled
> on a per-user basis (e.g. in ~/.profile), not for the entire system.
> There are many programs in the base system which don't [yet] support UTF-8.
thinking about it better I ran locate.updatedb from my user (LC_ALL set in .profile) to
be able to have a locate database asap after new install.
Adding a note in locate.updatedb(8) maybe ?
 Cheers
  Giovanni

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: locate.mklocatedb broken with LC_ALL!=C

Marc Espie-2
In reply to this post by Giovanni Bechis-7
On Sun, Oct 07, 2018 at 09:43:05AM +0200, Giovanni Bechis wrote:

> Hi,
> after setting LC_ALL=en_US.UTF-8 on my env locate.mklocatedb seems broken,
> resetting LC_ALL=C is a workaround.
>
> $ export LC_ALL=en_US.UTF-8
> $ doas /usr/libexec/locate.updatedb
> sort: Illegal byte sequence
> locate.mklocatedb: cannot build locate database
> $ export LC_ALL=C
> $ doas /usr/libexec/locate.updatedb
>
> Should we run weekly(8) with LC_ALL=C to be sure that locate.updatedb runs correctly ?
>
>  Cheers
>   Giovanni

Fixing locate.mklocatedb looks much better.

Specifically, the only part that cares about
locale is sort, and it's definitely correct in fixing
it's not run on an utf-8 file.

Reply | Threaded
Open this post in threaded view
|

Re: locate.mklocatedb broken with LC_ALL!=C

Todd C. Miller-2
On Sun, 07 Oct 2018 17:08:06 +0200, Marc Espie wrote:

> Specifically, the only part that cares about
> locale is sort, and it's definitely correct in fixing
> it's not run on an utf-8 file.

Agreed.  How about the following?

 - todd

Index: usr.bin/locate/locate/mklocatedb.sh
===================================================================
RCS file: /cvs/src/usr.bin/locate/locate/mklocatedb.sh,v
retrieving revision 1.13
diff -u -p -u -r1.13 mklocatedb.sh
--- usr.bin/locate/locate/mklocatedb.sh 18 Mar 2007 20:13:49 -0000 1.13
+++ usr.bin/locate/locate/mklocatedb.sh 8 Oct 2018 02:34:52 -0000
@@ -66,7 +66,8 @@ filelist=`mktemp ${TMPDIR=/tmp}/_filelis
 }
 trap 'rm -f $bigrams $filelist' 0 1 2 3 5 10 15
 
-if $sortcmd $sortopt > $filelist; then
+# Run sort in the C locale or binary data may be interpreted as UTF-8
+if LC_ALL=C $sortcmd $sortopt > $filelist; then
         $bigram < $filelist | $sort -nr |
                 awk -Ft 'BEGIN { ORS = "" } NR <= 128 { print $2 }' > $bigrams &&
         $code $bigrams < $filelist

Reply | Threaded
Open this post in threaded view
|

Re: locate.mklocatedb broken with LC_ALL!=C

Giovanni Bechis-7
On Sun, Oct 07, 2018 at 08:35:28PM -0600, Todd C. Miller wrote:
> On Sun, 07 Oct 2018 17:08:06 +0200, Marc Espie wrote:
>
> > Specifically, the only part that cares about
> > locale is sort, and it's definitely correct in fixing
> > it's not run on an utf-8 file.
>
> Agreed.  How about the following?
>
works for me, ok giovanni@
 Cheers & Thanks
  Giovanni

>  - todd
>
> Index: usr.bin/locate/locate/mklocatedb.sh
> ===================================================================
> RCS file: /cvs/src/usr.bin/locate/locate/mklocatedb.sh,v
> retrieving revision 1.13
> diff -u -p -u -r1.13 mklocatedb.sh
> --- usr.bin/locate/locate/mklocatedb.sh 18 Mar 2007 20:13:49 -0000 1.13
> +++ usr.bin/locate/locate/mklocatedb.sh 8 Oct 2018 02:34:52 -0000
> @@ -66,7 +66,8 @@ filelist=`mktemp ${TMPDIR=/tmp}/_filelis
>  }
>  trap 'rm -f $bigrams $filelist' 0 1 2 3 5 10 15
>  
> -if $sortcmd $sortopt > $filelist; then
> +# Run sort in the C locale or binary data may be interpreted as UTF-8
> +if LC_ALL=C $sortcmd $sortopt > $filelist; then
>          $bigram < $filelist | $sort -nr |
>                  awk -Ft 'BEGIN { ORS = "" } NR <= 128 { print $2 }' > $bigrams &&
>          $code $bigrams < $filelist

Reply | Threaded
Open this post in threaded view
|

Re: locate.mklocatedb broken with LC_ALL!=C

Giovanni Bechis-7
In reply to this post by Todd C. Miller-2
ping...
any possible issue with millert@ diff ?
 Giovanni

On Sun, Oct 07, 2018 at 08:35:28PM -0600, Todd C. Miller wrote:

> On Sun, 07 Oct 2018 17:08:06 +0200, Marc Espie wrote:
>
> > Specifically, the only part that cares about
> > locale is sort, and it's definitely correct in fixing
> > it's not run on an utf-8 file.
>
> Agreed.  How about the following?
>
>  - todd
>
> Index: usr.bin/locate/locate/mklocatedb.sh
> ===================================================================
> RCS file: /cvs/src/usr.bin/locate/locate/mklocatedb.sh,v
> retrieving revision 1.13
> diff -u -p -u -r1.13 mklocatedb.sh
> --- usr.bin/locate/locate/mklocatedb.sh 18 Mar 2007 20:13:49 -0000 1.13
> +++ usr.bin/locate/locate/mklocatedb.sh 8 Oct 2018 02:34:52 -0000
> @@ -66,7 +66,8 @@ filelist=`mktemp ${TMPDIR=/tmp}/_filelis
>  }
>  trap 'rm -f $bigrams $filelist' 0 1 2 3 5 10 15
>  
> -if $sortcmd $sortopt > $filelist; then
> +# Run sort in the C locale or binary data may be interpreted as UTF-8
> +if LC_ALL=C $sortcmd $sortopt > $filelist; then
>          $bigram < $filelist | $sort -nr |
>                  awk -Ft 'BEGIN { ORS = "" } NR <= 128 { print $2 }' > $bigrams &&
>          $code $bigrams < $filelist

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: locate.mklocatedb broken with LC_ALL!=C

Marc Espie-2
On Fri, Feb 15, 2019 at 07:58:48PM +0100, Giovanni Bechis wrote:

> ping...
> any possible issue with millert@ diff ?
>  Giovanni
>
> On Sun, Oct 07, 2018 at 08:35:28PM -0600, Todd C. Miller wrote:
> > On Sun, 07 Oct 2018 17:08:06 +0200, Marc Espie wrote:
> >
> > > Specifically, the only part that cares about
> > > locale is sort, and it's definitely correct in fixing
> > > it's not run on an utf-8 file.
> >
> > Agreed.  How about the following?
> >
> >  - todd
> >
> > Index: usr.bin/locate/locate/mklocatedb.sh
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/locate/locate/mklocatedb.sh,v
> > retrieving revision 1.13
> > diff -u -p -u -r1.13 mklocatedb.sh
> > --- usr.bin/locate/locate/mklocatedb.sh 18 Mar 2007 20:13:49 -0000 1.13
> > +++ usr.bin/locate/locate/mklocatedb.sh 8 Oct 2018 02:34:52 -0000
> > @@ -66,7 +66,8 @@ filelist=`mktemp ${TMPDIR=/tmp}/_filelis
> >  }
> >  trap 'rm -f $bigrams $filelist' 0 1 2 3 5 10 15
> >  
> > -if $sortcmd $sortopt > $filelist; then
> > +# Run sort in the C locale or binary data may be interpreted as UTF-8
> > +if LC_ALL=C $sortcmd $sortopt > $filelist; then
> >          $bigram < $filelist | $sort -nr |
> >                  awk -Ft 'BEGIN { ORS = "" } NR <= 128 { print $2 }' > $bigrams &&
> >          $code $bigrams < $filelist


Oh, I thought it had been committed ages ago