library: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

library: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n

Michael Paoli
> Synopsis:      Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n
> Category:      library
> Environment:
         System      : OpenBSD 6.7
         Details     : OpenBSD 6.7 (GENERIC) #7: Wed Jan  6 15:19:25 MST 2021
                           
[hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC

         Architecture: OpenBSD.amd64
         Machine     : amd64
> Description:
         Certain BRE expressions fail/misbehave unexpectedly.
         The failures are the same in both grep and sed (without -E).
         The failures only occur with certain combinations of use of:
         \{\}, \(\), \n (where n is digit) syntax, dropping any one
         of those then generally fails to trigger the bug.
         The bug/error can be seen most clearly in unexpected
         behavior of the \{m,n\} portion in the given context.
         If more of the (apparently dependent) context is removed,
         the bug doesn't show up.  E.g. some of the clearest cases
         involve replacing * with \{0,\} in the BRE, and getting
         quite unexpected results (one would expect the results
         to be the same).  These same BREs work under both
         Solaris 11 and GNU/Linux with their sed and grep.
> How-To-Repeat:
         This example code can be used to illustrate the problem,
         and both show cases where the bug shows up, and also slightly
         differing contexts where the bug does not occur.
         In each of these cases, the output should be the STRING
         we set/echo into grep/sed where we use our BRE, but in the bug
         cases we get no output.
         It's also suggested test cases be added to the code to catch
         possible regression bugs, should issue recur.  :-)
         Example code to show where bug does (and doesn't) show up:
         (
                 exec 2>&1
                 set -- \
                         'YYxx' 'Y*\(x\)\1' \
                         'YYxx' 'Y\{0,\}\(x\)\1' \
                         'YYxx' 'Y\{2,\}\(x\)\1' \
                         'YYxx' 'Y\{0,\}\(x\)' \
                         'YYxx' 'Y\{2,\}x' \
                         'YYxx' 'Y\{2,\}x\{1,\}' \
                         'YYxx' 'Y\{2,\}x\{0,\}' \
                         'YYxxz' 'Y\{2,\}x\{0,\}z' \
                         'YYxxz' 'Y\{0,\}x\{0,\}z' \
                         'YYxyxy' 'Y\{2,\}\(xy\)\1' \
                         'YYxyxy' 'Y\{0,\}\(xy\)\1' \
                         'YYxyxy' 'Y*\(xy\)\1' \
                         'YYxyxy' 'Y\{0,\}\(xy\)xy'
                 while [ "$#" -ge 2 ]
                 do
                         STRING="$1"; shift; BRE="$1"; shift
                         set -x
                         echo "$STRING" | grep -e "$BRE"
                         echo "$STRING" | sed -ne "s/$BRE/&/p"
                         set +x
                 done
         )
         Example run of above code.  Bug is present where our
         STRING echoed into grep/sed fails to appear in the
         output:
         + echo YYxx
         + grep -e Y*\(x\)\1
         YYxx
         + echo YYxx
         + sed -ne s/Y*\(x\)\1/&/p
         YYxx
         + set +x
         + echo YYxx
         + grep -e Y\{0,\}\(x\)\1
         + echo YYxx
         + sed -ne s/Y\{0,\}\(x\)\1/&/p
         + set +x
         + echo YYxx
         + grep -e Y\{2,\}\(x\)\1
         YYxx
         + echo YYxx
         + sed -ne s/Y\{2,\}\(x\)\1/&/p
         YYxx
         + set +x
         + echo YYxx
         + grep -e Y\{0,\}\(x\)
         YYxx
         + echo YYxx
         + sed -ne s/Y\{0,\}\(x\)/&/p
         YYxx
         + set +x
         + echo YYxx
         + grep -e Y\{2,\}x
         YYxx
         + echo YYxx
         + sed -ne s/Y\{2,\}x/&/p
         YYxx
         + set +x
         + echo YYxx
         + grep -e Y\{2,\}x\{1,\}
         YYxx
         + echo YYxx
         + sed -ne s/Y\{2,\}x\{1,\}/&/p
         YYxx
         + set +x
         + echo YYxx
         + grep -e Y\{2,\}x\{0,\}
         YYxx
         + echo YYxx
         + sed -ne s/Y\{2,\}x\{0,\}/&/p
         YYxx
         + set +x
         + echo YYxxz
         + grep -e Y\{2,\}x\{0,\}z
         YYxxz
         + echo YYxxz
         + sed -ne s/Y\{2,\}x\{0,\}z/&/p
         YYxxz
         + set +x
         + echo YYxxz
         + grep -e Y\{0,\}x\{0,\}z
         YYxxz
         + echo YYxxz
         + sed -ne s/Y\{0,\}x\{0,\}z/&/p
         YYxxz
         + set +x
         + echo YYxyxy
         + grep -e Y\{2,\}\(xy\)\1
         YYxyxy
         + echo YYxyxy
         + sed -ne s/Y\{2,\}\(xy\)\1/&/p
         YYxyxy
         + set +x
         + echo YYxyxy
         + grep -e Y\{0,\}\(xy\)\1
         + echo YYxyxy
         + sed -ne s/Y\{0,\}\(xy\)\1/&/p
         + set +x
         + echo YYxyxy
         + grep -e Y*\(xy\)\1
         YYxyxy
         + echo YYxyxy
         + sed -ne s/Y*\(xy\)\1/&/p
         YYxyxy
         + set +x
         + echo YYxyxy
         + grep -e Y\{0,\}\(xy\)xy
         YYxyxy
         + echo YYxyxy
         + sed -ne s/Y\{0,\}\(xy\)xy/&/p
         YYxyxy
         + set +x
> Fix:
         No known general work-around