ksh: fix input handling for 4 byte UTF-8 sequences

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

ksh: fix input handling for 4 byte UTF-8 sequences

Sören Tempel-2

Currently, ksh does not correctly calculate the length of 4 byte UTF-8
sequences in emacs input mode. For demonstration purposes try inputting
an emoji (e.g. U+1F421) at your shell prompt. These 4 byte sequences can
be identified by checking if the first four bits are set and the fifth
bit isn't. The current check for identifying these 4 byte sequences is

The patch below fixes this, thereby allowing users to enter emojis
(and other 4 byte UTF-8 sequences) at their shell prompt in emacs mode:


diff --git bin/ksh/emacs.c bin/ksh/emacs.c
index 694c402ff..970a0989d 100644
--- bin/ksh/emacs.c
+++ bin/ksh/emacs.c
@@ -1851,7 +1851,7 @@ x_e_getu8(char *buf, int off)
  return -1;
  buf[off++] = c;
- if (c == 0xf4)
+ if ((c & 0xf8) == 0xf0)
  len = 4;
  else if ((c & 0xf0) == 0xe0)
  len = 3;