ed(1) text editor issue with Spanish accents

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

ed(1) text editor issue with Spanish accents

Alejandro G. Peregrina
Hello,

I've noticed something unexpected when entering an accent character
alone (´) and then deleting it in ed(1) in xterm(1). Instead of deleting
it, it creates another character which is seen as an inverted
exclamation (?) in the font 'misc-fixed'.

        How to reproduce:
$ uname -a
OpenBSD foo.my.domain 6.2 GENERIC.MP#1 amd64
$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE=en_US.UTF-8
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_MESSAGES="C"
LC_ALL=
$ #Let's append the ´ character in ed(1)
$ ed -p"> "
> a
´

        Now let's delete with a backspace, return to create a newline and a dot
to stop appending, and then print:

$ ed -p"> "
> a

.
> p
(?)

        (The (?) is a simulation of the font character that misc-fixed shows to
the terminal.)

        Whenever I use more(1) or less(1) to view it, it shows:

$ more test.txt
<C2>



I have to add that I tested this with urxvt and ed(1) prints an Â
character, but more(1) and less(1) keep printing <C2>.

When not using X this can't be reproduced. This is reproducible with
xterm(1) and urxvt(1) in cwm(1) and fvwm(1). I've tested this in Linux
and FreeBSD and this behaviour is not reproducible.

Thank you,
A

Reply | Threaded
Open this post in threaded view
|

Re: ed(1) text editor issue with Spanish accents

Martijn van Duren-6
Hello Alejandro,

ed works on both binary and ASCII text, which are all individual bytes.
Since ´ is an UTF-8 character, which comprises of the bytes C2 and B4,
ed thinks it should only delete a single byte which results in only C2.

Your terminal can't tell the meaning of just C2 which results, in this
particular case, in a question mark.

The reason the character disappears after the backspace is because the
presentation layer gets the instruction to clear the column prior to
the current position, so hence it appears deleted after the backspace.

Currently there's no UTF-8 support in our ed, and I don't see how this
can be done without endangering the binary editing capabilities.

martijn@

On 12/04/17 00:43, Alejandro G. Peregrina wrote:

> Hello,
>
> I've noticed something unexpected when entering an accent character
> alone (´) and then deleting it in ed(1) in xterm(1). Instead of deleting
> it, it creates another character which is seen as an inverted
> exclamation (?) in the font 'misc-fixed'.
>
> How to reproduce:
> $ uname -a
> OpenBSD foo.my.domain 6.2 GENERIC.MP#1 amd64
> $ locale
> LANG=
> LC_COLLATE="C"
> LC_CTYPE=en_US.UTF-8
> LC_MONETARY="C"
> LC_NUMERIC="C"
> LC_TIME="C"
> LC_MESSAGES="C"
> LC_ALL=
> $ #Let's append the ´ character in ed(1)
> $ ed -p"> "
>> a
> ´
>
> Now let's delete with a backspace, return to create a newline and a dot
> to stop appending, and then print:
>
> $ ed -p"> "
>> a
>
> .
>> p
> (?)
>
> (The (?) is a simulation of the font character that misc-fixed shows to
> the terminal.)
>
> Whenever I use more(1) or less(1) to view it, it shows:
>
> $ more test.txt
> <C2>
>
>
>
> I have to add that I tested this with urxvt and ed(1) prints an Â
> character, but more(1) and less(1) keep printing <C2>.
>
> When not using X this can't be reproduced. This is reproducible with
> xterm(1) and urxvt(1) in cwm(1) and fvwm(1). I've tested this in Linux
> and FreeBSD and this behaviour is not reproducible.
>
> Thank you,
> A
>