HS_ENCODING in localeEncoding

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

HS_ENCODING in localeEncoding

Greg Steuck
Hi Matthias,

Do you believe lang/ghc/patches/patch-libraries_base_cbits_PrelIOUtils_c
is still relevant?

I ran some test programs and both locale_charset() and
nl_langinfo(CODESET) yield sensible results (at least for UTF8). The
history of the patch seems to go back approximately 10 years, so maybe
OpenBSD locale support improved enough to consider dropping the patch?
I'll rebuild GHC without that patch locally but also wanted to check
with you in the meantime.

Thanks
Greg
--
nest.cx is Gmail hosted, use PGP: https://pgp.key-server.io/0x0B1542BD8DF5A1B0
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3  4D50 0B15 42BD 8DF5 A1B0

Reply | Threaded
Open this post in threaded view
|

Re: HS_ENCODING in localeEncoding

Ingo Schwarze
Hi Greg,

take this with a shovel of salt as i didn't test the port or anything
with Haskell (and i'm unlikely to).  So this is purely from code
inspection.

Greg Steuck wrote on Sun, Feb 02, 2020 at 10:00:06PM -0800:

> Do you believe lang/ghc/patches/patch-libraries_base_cbits_PrelIOUtils_c
> is still relevant?

It may have been OK in 2015, but it looks rather bogus now.

> I ran some test programs and both locale_charset() and

The function locale_charset() appears to be part of the converters/libiconv
package, but it appears to be totally undocumented.  I failed to
find any documentation whatsoever: neither in the package nor even
with Google on the web.  While iconv(3) is in POSIX, locale_charset()
is not.  I think we should avoid using stuff when it is unclear
what it is, so consider forcing HAVE_LIBCHARSET to undefined in the
port (i'm not saying you must do this, only that it could be
considered).

> nl_langinfo(CODESET) yield sensible results (at least for UTF8).

And indeed, UTF-8 is the only non-POSIX charset that OpenBSD supports,
and there are no plans to support any others in the future.

Consequently, nl_langinfo(CODESET) will always return either "US-ASCII"
or "UTF-8" on OpenBSD.

> The
> history of the patch seems to go back approximately 10 years, so maybe
> OpenBSD locale support improved enough to consider dropping the patch?

I think so.

Three apsects of the patch look bogus in particular:

 1. Avoiding nl_langinfo(3).
    There is no longer anything wrong with that function.

 2. Inspecting HS_ENCODING.
    It appears the code wants to inspect the current locale.
    If any environment variable, then LC_ALL/LC_CTYPE/LANG
    should probably be inspected, not HS_ENCODING.
    The code in the patch gives me the impression that this is
    probably already done with setlocale(3) elsewehere in the
    program, so i'm not sure the environment should be inspected
    again.
    Also, using setlocale(3) is almost always preferable to
    manual inspection of environment variables.

 3. Setting anything to "latin1".
    Given that we don't even support that locale at all,
    that's the worst of the three dubious aspects.

> I'll rebuild GHC without that patch locally but also wanted to check
> with you in the meantime.

Right, and you might also try convincing ./configure (or whatever is
used in the build system) to switch HAVE_LIBCHARSET off, unless that
has unrelated detrimental effects.

Yours,
  Ingo

Reply | Threaded
Open this post in threaded view
|

Re: HS_ENCODING in localeEncoding

Matthias Kilian
In reply to this post by Greg Steuck
Hi Greg,

On Sun, Feb 02, 2020 at 10:00:06PM -0800, Greg Steuck wrote:
> Do you believe lang/ghc/patches/patch-libraries_base_cbits_PrelIOUtils_c
> is still relevant?

It's probably outdated junk. What a mess ;-)

> I ran some test programs and both locale_charset() and
> nl_langinfo(CODESET) yield sensible results (at least for UTF8). The
> history of the patch seems to go back approximately 10 years, so maybe
> OpenBSD locale support improved enough to consider dropping the patch?
> I'll rebuild GHC without that patch locally but also wanted to check
> with you in the meantime.

IIRC, the complaint (from that time 10 years ago) was that ghci
crached at the same moment you enter a character not supported by
whatever LC_CTYPE contains.

So, yes, if  you have the time to rebuild without that patch and
test it , I'll love it.

(Can't  do this for mmyself, because my port building machine still
causes troubles)

Ciao,
        Kili

Reply | Threaded
Open this post in threaded view
|

Re: HS_ENCODING in localeEncoding

Greg Steuck
Hi Matthias,

On Mon, Feb 3, 2020 at 3:31 PM Matthias Kilian <[hidden email]> wrote:
> IIRC, the complaint (from that time 10 years ago) was that ghci
> crached at the same moment you enter a character not supported by
> whatever LC_CTYPE contains.
>
> So, yes, if  you have the time to rebuild without that patch and
> test it , I'll love it.

See the attached patch. It removes the three patches with HS_ENCODING
in them and adds a configure option as suggested by Ingo.

So far I confirmed that:
  * getLocaleEncoding is correctly steered by LC_ALL
  * utf8 characters are accepted by ghci and correctly parsed into
String literals
  * utf8 characters are printed back correctly
  * utf8 characters become ? when locale is set to ASCII
By utf8 I mean Cyrillic :)

Thanks
Greg
--
nest.cx is Gmail hosted, use PGP: https://pgp.key-server.io/0x0B1542BD8DF5A1B0
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3  4D50 0B15 42BD 8DF5 A1B0

0001-Remove-HS_ENCODING.-LC_ALL-works-now.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: HS_ENCODING in localeEncoding

Greg Steuck
In reply to this post by Ingo Schwarze
Hi Ingo,

On Mon, Feb 3, 2020 at 6:31 AM Ingo Schwarze <[hidden email]> wrote:
> The function locale_charset() appears to be part of the converters/libiconv
> package, but it appears to be totally undocumented.  I failed to
> find any documentation whatsoever: neither in the package nor even
> with Google on the web.  While iconv(3) is in POSIX, locale_charset()
> is not.  I think we should avoid using stuff when it is unclear
> what it is, so consider forcing HAVE_LIBCHARSET to undefined in the
> port (i'm not saying you must do this, only that it could be
> considered).

Done in my patch.

> And indeed, UTF-8 is the only non-POSIX charset that OpenBSD supports,
> and there are no plans to support any others in the future.

Seems to be working as advertised.

> Consequently, nl_langinfo(CODESET) will always return either "US-ASCII"
> or "UTF-8" on OpenBSD.

Yes, as prominently documented in the man pages :)

> Three apsects of the patch look bogus in particular:
>
>  1. Avoiding nl_langinfo(3).
>     There is no longer anything wrong with that function.

Confirmed.

>  2. Inspecting HS_ENCODING.
>     It appears the code wants to inspect the current locale.
>     If any environment variable, then LC_ALL/LC_CTYPE/LANG
>     should probably be inspected, not HS_ENCODING.
>     The code in the patch gives me the impression that this is
>     probably already done with setlocale(3) elsewehere in the
>     program, so i'm not sure the environment should be inspected
>     again.

Yes, setlocale is one of the first thing done by GHC runtime.
https://github.com/ghc/ghc/blob/7f72b540288bbdb32a6750dd64b9d366501ed10c/rts/RtsStartup.c#L156

> Right, and you might also try convincing ./configure (or whatever is
> used in the build system) to switch HAVE_LIBCHARSET off, unless that
> has unrelated detrimental effects.

Thanks for the tip! Done.

Thanks
Greg
--
nest.cx is Gmail hosted, use PGP: https://pgp.key-server.io/0x0B1542BD8DF5A1B0
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3  4D50 0B15 42BD 8DF5 A1B0

Reply | Threaded
Open this post in threaded view
|

Re: HS_ENCODING in localeEncoding

Greg Steuck
In reply to this post by Greg Steuck
On Tue, Feb 4, 2020 at 10:38 AM Greg Steuck <[hidden email]> wrote:
> So far I confirmed that:
>   * getLocaleEncoding is correctly steered by LC_ALL
>   * utf8 characters are accepted by ghci and correctly parsed into
> String literals
>   * utf8 characters are printed back correctly
>   * utf8 characters become ? when locale is set to ASCII

Happily the patch also fixed the original bug that triggered my
exploration: xmobar now displays unicode correctly given
LC_ALL=en_US.UTF-8.

Thanks
Greg
--
nest.cx is Gmail hosted, use PGP: https://pgp.key-server.io/0x0B1542BD8DF5A1B0
Fingerprint: 5E2B 2D0E 1E03 2046 BEC3  4D50 0B15 42BD 8DF5 A1B0

Reply | Threaded
Open this post in threaded view
|

Re: HS_ENCODING in localeEncoding

Matthias Kilian
Hi Greg,
On Tue, Feb 04, 2020 at 09:28:20PM -0800, Greg Steuck wrote:

> On Tue, Feb 4, 2020 at 10:38 AM Greg Steuck <[hidden email]> wrote:
> > So far I confirmed that:
> >   * getLocaleEncoding is correctly steered by LC_ALL
> >   * utf8 characters are accepted by ghci and correctly parsed into
> > String literals
> >   * utf8 characters are printed back correctly
> >   * utf8 characters become ? when locale is set to ASCII
>
> Happily the patch also fixed the original bug that triggered my
> exploration: xmobar now displays unicode correctly given
> LC_ALL=en_US.UTF-8.

Thanks for all the testing and for the patch.

I'll do a test build myself (hopefully tomorrow or on friday) and
commit your changes.

Ciao,
        Kili