ksh(1): overwritten prompt caused by UTF-8 character

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

ksh(1): overwritten prompt caused by UTF-8 character

Anton Lindqvist
I recently encountered a bug related to UTF-8 in ksh(1).

While inserting the following sequence, part of my prompt gets mangled:

  a<backward-char>ö

With PS1='ksh$ ' I expect the following output:

  ksh$ öa

... actual output:

  kshöaa

Examining the output buffer when the 'ö' character is inserted shows the
following, piped through hexdump:

00000000  c3 61 08                                          |.a.|
00000003

0xc3 is the first byte of the 'ö' character and the trailing backspace
(0x08) causes the cursor to move past the incomplete UTF-8 sequence. The
backspace is emitted by the following lines in function x_ins:

$ sed -n 460,464p /usr/src/bin/ksh/emacs.c
  if (adj == x_adj_done) {
    /* no */
    for (cp = xlp; cp > xcp; )
      x_bs(*--cp);
  }

A solution would be to only emit a backspace if cp[-1] is a UTF-8
continuation byte and cp[-2] a UTF-8 start byte. This removes one of
erroneous backspaces that eats the prompt.

Examining the output buffer when the last byte (0xb6) of 'ö' is
inserted:

00000000  08 c3 b6 61 08                                    |...a.|

The leading erroneous backspace is caused by the following lines in
function x_zots, introduced in r1.64:

$ sed -n 687,691p bin/ksh/emacs.c
  if (str > xbuf && isu8cont(*str)) {
    while (str > xbuf && isu8cont(*str))
      str--;
    x_e_putc('\b');
  }

I haven't found any viable solution to not emit the backspace if a
character is prepended, as opposed of appended.

Any ideas on how to solve this issue would be much appreciated.

Reply | Threaded
Open this post in threaded view
|

Re: ksh(1): overwritten prompt caused by UTF-8 character

Ingo Schwarze
Hi,

Anton Lindqvist wrote on Sun, Jan 22, 2017 at 02:57:12PM +0100:

> I recently encountered a bug related to UTF-8 in ksh(1).
>
> While inserting the following sequence, part of my prompt gets mangled:
>
>   a<backward-char>ö
>
> With PS1='ksh$ ' I expect the following output:
>
>   ksh$ öa
>
> ... actual output:
>
>   kshöaa

I cannot reproduce.  It works for me on OpenBSD-current (amd64).

Which version of OpenBSD are you using?

> Examining the output buffer when the 'ö' character is inserted
> shows the following, piped through hexdump:
>
> 00000000  c3 61 08                                          |.a.|
> 00000003
>
> 0xc3 is the first byte of the 'ö' character and the trailing
> backspace (0x08) causes the cursor to move past the incomplete UTF-8
> sequence.

I don't understand what you are talking about here.  In particular,
what is that "output buffer" you are talking about?

> The backspace is emitted by the following lines in function x_ins:
>
> $ sed -n 460,464p /usr/src/bin/ksh/emacs.c
>   if (adj == x_adj_done) {
>     /* no */
>     for (cp = xlp; cp > xcp; )
>       x_bs(*--cp);
>   }
>
> A solution would be to only emit a backspace if cp[-1] is a UTF-8
> continuation byte and cp[-2] a UTF-8 start byte. This removes one of
> erroneous backspaces that eats the prompt.
>
> Examining the output buffer when the last byte (0xb6) of 'ö' is
> inserted:
>
> 00000000  08 c3 b6 61 08                                    |...a.|
>
> The leading erroneous backspace is caused by the following lines in
> function x_zots, introduced in r1.64:
>
> $ sed -n 687,691p bin/ksh/emacs.c
>   if (str > xbuf && isu8cont(*str)) {
>     while (str > xbuf && isu8cont(*str))
>       str--;
>     x_e_putc('\b');
>   }
>
> I haven't found any viable solution to not emit the backspace if a
> character is prepended, as opposed of appended.
>
> Any ideas on how to solve this issue would be much appreciated.

I neither understand the problem nor any part of your analysis.

Sorry,
  Ingo

Reply | Threaded
Open this post in threaded view
|

Re: ksh(1): overwritten prompt caused by UTF-8 character

Anton Lindqvist
On Sun, Jan 22, 2017 at 03:55:25PM +0100, Ingo Schwarze wrote:

> Hi,
>
> Anton Lindqvist wrote on Sun, Jan 22, 2017 at 02:57:12PM +0100:
>
> > I recently encountered a bug related to UTF-8 in ksh(1).
> >
> > While inserting the following sequence, part of my prompt gets mangled:
> >
> >   a<backward-char>ö
> >
> > With PS1='ksh$ ' I expect the following output:
> >
> >   ksh$ öa
> >
> > ... actual output:
> >
> >   kshöaa
>
> I cannot reproduce.  It works for me on OpenBSD-current (amd64).
>
> Which version of OpenBSD are you using?

My bad, turns out this problem is related to my terminal emulator rather
than ksh. I can't re-produce the problem in either xterm or console.

Sorry for the noise.