library: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

library: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n

Michael Paoli
> Synopsis:      Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n
> Category:      library
> Environment:
         System      : OpenBSD 6.7
         Details     : OpenBSD 6.7 (GENERIC) #7: Wed Jan  6 15:19:25 MST 2021
                           
[hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC

         Architecture: OpenBSD.amd64
         Machine     : amd64
> Description:
         Certain BRE expressions fail/misbehave unexpectedly.
         The failures are the same in both grep and sed (without -E).
         The failures only occur with certain combinations of use of:
         \{\}, \(\), \n (where n is digit) syntax, dropping any one
         of those then generally fails to trigger the bug.
         The bug/error can be seen most clearly in unexpected
         behavior of the \{m,n\} portion in the given context.
         If more of the (apparently dependent) context is removed,
         the bug doesn't show up.  E.g. some of the clearest cases
         involve replacing * with \{0,\} in the BRE, and getting
         quite unexpected results (one would expect the results
         to be the same).  These same BREs work under both
         Solaris 11 and GNU/Linux with their sed and grep.
> How-To-Repeat:
         This example code can be used to illustrate the problem,
         and both show cases where the bug shows up, and also slightly
         differing contexts where the bug does not occur.
         In each of these cases, the output should be the STRING
         we set/echo into grep/sed where we use our BRE, but in the bug
         cases we get no output.
         It's also suggested test cases be added to the code to catch
         possible regression bugs, should issue recur.  :-)
         Example code to show where bug does (and doesn't) show up:
         (
                 exec 2>&1
                 set -- \
                         'YYxx' 'Y*\(x\)\1' \
                         'YYxx' 'Y\{0,\}\(x\)\1' \
                         'YYxx' 'Y\{2,\}\(x\)\1' \
                         'YYxx' 'Y\{0,\}\(x\)' \
                         'YYxx' 'Y\{2,\}x' \
                         'YYxx' 'Y\{2,\}x\{1,\}' \
                         'YYxx' 'Y\{2,\}x\{0,\}' \
                         'YYxxz' 'Y\{2,\}x\{0,\}z' \
                         'YYxxz' 'Y\{0,\}x\{0,\}z' \
                         'YYxyxy' 'Y\{2,\}\(xy\)\1' \
                         'YYxyxy' 'Y\{0,\}\(xy\)\1' \
                         'YYxyxy' 'Y*\(xy\)\1' \
                         'YYxyxy' 'Y\{0,\}\(xy\)xy'
                 while [ "$#" -ge 2 ]
                 do
                         STRING="$1"; shift; BRE="$1"; shift
                         set -x
                         echo "$STRING" | grep -e "$BRE"
                         echo "$STRING" | sed -ne "s/$BRE/&/p"
                         set +x
                 done
         )
         Example run of above code.  Bug is present where our
         STRING echoed into grep/sed fails to appear in the
         output:
         + echo YYxx
         + grep -e Y*\(x\)\1
         YYxx
         + echo YYxx
         + sed -ne s/Y*\(x\)\1/&/p
         YYxx
         + set +x
         + echo YYxx
         + grep -e Y\{0,\}\(x\)\1
         + echo YYxx
         + sed -ne s/Y\{0,\}\(x\)\1/&/p
         + set +x
         + echo YYxx
         + grep -e Y\{2,\}\(x\)\1
         YYxx
         + echo YYxx
         + sed -ne s/Y\{2,\}\(x\)\1/&/p
         YYxx
         + set +x
         + echo YYxx
         + grep -e Y\{0,\}\(x\)
         YYxx
         + echo YYxx
         + sed -ne s/Y\{0,\}\(x\)/&/p
         YYxx
         + set +x
         + echo YYxx
         + grep -e Y\{2,\}x
         YYxx
         + echo YYxx
         + sed -ne s/Y\{2,\}x/&/p
         YYxx
         + set +x
         + echo YYxx
         + grep -e Y\{2,\}x\{1,\}
         YYxx
         + echo YYxx
         + sed -ne s/Y\{2,\}x\{1,\}/&/p
         YYxx
         + set +x
         + echo YYxx
         + grep -e Y\{2,\}x\{0,\}
         YYxx
         + echo YYxx
         + sed -ne s/Y\{2,\}x\{0,\}/&/p
         YYxx
         + set +x
         + echo YYxxz
         + grep -e Y\{2,\}x\{0,\}z
         YYxxz
         + echo YYxxz
         + sed -ne s/Y\{2,\}x\{0,\}z/&/p
         YYxxz
         + set +x
         + echo YYxxz
         + grep -e Y\{0,\}x\{0,\}z
         YYxxz
         + echo YYxxz
         + sed -ne s/Y\{0,\}x\{0,\}z/&/p
         YYxxz
         + set +x
         + echo YYxyxy
         + grep -e Y\{2,\}\(xy\)\1
         YYxyxy
         + echo YYxyxy
         + sed -ne s/Y\{2,\}\(xy\)\1/&/p
         YYxyxy
         + set +x
         + echo YYxyxy
         + grep -e Y\{0,\}\(xy\)\1
         + echo YYxyxy
         + sed -ne s/Y\{0,\}\(xy\)\1/&/p
         + set +x
         + echo YYxyxy
         + grep -e Y*\(xy\)\1
         YYxyxy
         + echo YYxyxy
         + sed -ne s/Y*\(xy\)\1/&/p
         YYxyxy
         + set +x
         + echo YYxyxy
         + grep -e Y\{0,\}\(xy\)xy
         YYxyxy
         + echo YYxyxy
         + sed -ne s/Y\{0,\}\(xy\)xy/&/p
         YYxyxy
         + set +x
> Fix:
         No known general work-around

Reply | Threaded
Open this post in threaded view
|

Re: library: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n

Otto Moerbeek
On Tue, Feb 23, 2021 at 04:16:09AM -0800, Michael Paoli wrote:

> > Synopsis:      Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n
> > Category:      library
> > Environment:
>         System      : OpenBSD 6.7
>         Details     : OpenBSD 6.7 (GENERIC) #7: Wed Jan  6 15:19:25 MST 2021
> [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
>
>         Architecture: OpenBSD.amd64
>         Machine     : amd64
> > Description:
>         Certain BRE expressions fail/misbehave unexpectedly.
>         The failures are the same in both grep and sed (without -E).
>         The failures only occur with certain combinations of use of:
>         \{\}, \(\), \n (where n is digit) syntax, dropping any one
>         of those then generally fails to trigger the bug.
>         The bug/error can be seen most clearly in unexpected
>         behavior of the \{m,n\} portion in the given context.
>         If more of the (apparently dependent) context is removed,
>         the bug doesn't show up.  E.g. some of the clearest cases
>         involve replacing * with \{0,\} in the BRE, and getting
>         quite unexpected results (one would expect the results
>         to be the same).  These same BREs work under both
>         Solaris 11 and GNU/Linux with their sed and grep.
> > How-To-Repeat:
>         This example code can be used to illustrate the problem,
>         and both show cases where the bug shows up, and also slightly
>         differing contexts where the bug does not occur.
>         In each of these cases, the output should be the STRING
>         we set/echo into grep/sed where we use our BRE, but in the bug
>         cases we get no output.
>         It's also suggested test cases be added to the code to catch
>         possible regression bugs, should issue recur.  :-)
>         Example code to show where bug does (and doesn't) show up:
>         (
>                 exec 2>&1
>                 set -- \
>                         'YYxx' 'Y*\(x\)\1' \
>                         'YYxx' 'Y\{0,\}\(x\)\1' \
>                         'YYxx' 'Y\{2,\}\(x\)\1' \
>                         'YYxx' 'Y\{0,\}\(x\)' \
>                         'YYxx' 'Y\{2,\}x' \
>                         'YYxx' 'Y\{2,\}x\{1,\}' \
>                         'YYxx' 'Y\{2,\}x\{0,\}' \
>                         'YYxxz' 'Y\{2,\}x\{0,\}z' \
>                         'YYxxz' 'Y\{0,\}x\{0,\}z' \
>                         'YYxyxy' 'Y\{2,\}\(xy\)\1' \
>                         'YYxyxy' 'Y\{0,\}\(xy\)\1' \
>                         'YYxyxy' 'Y*\(xy\)\1' \
>                         'YYxyxy' 'Y\{0,\}\(xy\)xy'
>                 while [ "$#" -ge 2 ]
>                 do
>                         STRING="$1"; shift; BRE="$1"; shift
>                         set -x
>                         echo "$STRING" | grep -e "$BRE"
>                         echo "$STRING" | sed -ne "s/$BRE/&/p"
>                         set +x
>                 done
>         )
>         Example run of above code.  Bug is present where our
>         STRING echoed into grep/sed fails to appear in the
>         output:
>         + echo YYxx
>         + grep -e Y*\(x\)\1
>         YYxx
>         + echo YYxx
>         + sed -ne s/Y*\(x\)\1/&/p
>         YYxx
>         + set +x
>         + echo YYxx
>         + grep -e Y\{0,\}\(x\)\1
>         + echo YYxx
>         + sed -ne s/Y\{0,\}\(x\)\1/&/p
>         + set +x
>         + echo YYxx
>         + grep -e Y\{2,\}\(x\)\1
>         YYxx
>         + echo YYxx
>         + sed -ne s/Y\{2,\}\(x\)\1/&/p
>         YYxx
>         + set +x
>         + echo YYxx
>         + grep -e Y\{0,\}\(x\)
>         YYxx
>         + echo YYxx
>         + sed -ne s/Y\{0,\}\(x\)/&/p
>         YYxx
>         + set +x
>         + echo YYxx
>         + grep -e Y\{2,\}x
>         YYxx
>         + echo YYxx
>         + sed -ne s/Y\{2,\}x/&/p
>         YYxx
>         + set +x
>         + echo YYxx
>         + grep -e Y\{2,\}x\{1,\}
>         YYxx
>         + echo YYxx
>         + sed -ne s/Y\{2,\}x\{1,\}/&/p
>         YYxx
>         + set +x
>         + echo YYxx
>         + grep -e Y\{2,\}x\{0,\}
>         YYxx
>         + echo YYxx
>         + sed -ne s/Y\{2,\}x\{0,\}/&/p
>         YYxx
>         + set +x
>         + echo YYxxz
>         + grep -e Y\{2,\}x\{0,\}z
>         YYxxz
>         + echo YYxxz
>         + sed -ne s/Y\{2,\}x\{0,\}z/&/p
>         YYxxz
>         + set +x
>         + echo YYxxz
>         + grep -e Y\{0,\}x\{0,\}z
>         YYxxz
>         + echo YYxxz
>         + sed -ne s/Y\{0,\}x\{0,\}z/&/p
>         YYxxz
>         + set +x
>         + echo YYxyxy
>         + grep -e Y\{2,\}\(xy\)\1
>         YYxyxy
>         + echo YYxyxy
>         + sed -ne s/Y\{2,\}\(xy\)\1/&/p
>         YYxyxy
>         + set +x
>         + echo YYxyxy
>         + grep -e Y\{0,\}\(xy\)\1
>         + echo YYxyxy
>         + sed -ne s/Y\{0,\}\(xy\)\1/&/p
>         + set +x
>         + echo YYxyxy
>         + grep -e Y*\(xy\)\1
>         YYxyxy
>         + echo YYxyxy
>         + sed -ne s/Y*\(xy\)\1/&/p
>         YYxyxy
>         + set +x
>         + echo YYxyxy
>         + grep -e Y\{0,\}\(xy\)xy
>         YYxyxy
>         + echo YYxyxy
>         + sed -ne s/Y\{0,\}\(xy\)xy/&/p
>         YYxyxy
>         + set +x
> > Fix:
>         No known general work-around
>
>

Hi,

I can reproduce on current. Do you have an idea if NetBSD or FreeBSD
suffer from te same?

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: library: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n

Otto Moerbeek
On Fri, Apr 02, 2021 at 01:57:07PM +0200, Otto Moerbeek wrote:

> On Tue, Feb 23, 2021 at 04:16:09AM -0800, Michael Paoli wrote:
>
> > > Synopsis:      Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n
> > > Category:      library
> > > Environment:
> >         System      : OpenBSD 6.7
> >         Details     : OpenBSD 6.7 (GENERIC) #7: Wed Jan  6 15:19:25 MST 2021
> > [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
> >
> >         Architecture: OpenBSD.amd64
> >         Machine     : amd64
> > > Description:
> >         Certain BRE expressions fail/misbehave unexpectedly.
> >         The failures are the same in both grep and sed (without -E).
> >         The failures only occur with certain combinations of use of:
> >         \{\}, \(\), \n (where n is digit) syntax, dropping any one
> >         of those then generally fails to trigger the bug.
> >         The bug/error can be seen most clearly in unexpected
> >         behavior of the \{m,n\} portion in the given context.
> >         If more of the (apparently dependent) context is removed,
> >         the bug doesn't show up.  E.g. some of the clearest cases
> >         involve replacing * with \{0,\} in the BRE, and getting
> >         quite unexpected results (one would expect the results
> >         to be the same).  These same BREs work under both
> >         Solaris 11 and GNU/Linux with their sed and grep.
> > > How-To-Repeat:
> >         This example code can be used to illustrate the problem,
> >         and both show cases where the bug shows up, and also slightly
> >         differing contexts where the bug does not occur.
> >         In each of these cases, the output should be the STRING
> >         we set/echo into grep/sed where we use our BRE, but in the bug
> >         cases we get no output.
> >         It's also suggested test cases be added to the code to catch
> >         possible regression bugs, should issue recur.  :-)
> >         Example code to show where bug does (and doesn't) show up:
> >         (
> >                 exec 2>&1
> >                 set -- \
> >                         'YYxx' 'Y*\(x\)\1' \
> >                         'YYxx' 'Y\{0,\}\(x\)\1' \
> >                         'YYxx' 'Y\{2,\}\(x\)\1' \
> >                         'YYxx' 'Y\{0,\}\(x\)' \
> >                         'YYxx' 'Y\{2,\}x' \
> >                         'YYxx' 'Y\{2,\}x\{1,\}' \
> >                         'YYxx' 'Y\{2,\}x\{0,\}' \
> >                         'YYxxz' 'Y\{2,\}x\{0,\}z' \
> >                         'YYxxz' 'Y\{0,\}x\{0,\}z' \
> >                         'YYxyxy' 'Y\{2,\}\(xy\)\1' \
> >                         'YYxyxy' 'Y\{0,\}\(xy\)\1' \
> >                         'YYxyxy' 'Y*\(xy\)\1' \
> >                         'YYxyxy' 'Y\{0,\}\(xy\)xy'
> >                 while [ "$#" -ge 2 ]
> >                 do
> >                         STRING="$1"; shift; BRE="$1"; shift
> >                         set -x
> >                         echo "$STRING" | grep -e "$BRE"
> >                         echo "$STRING" | sed -ne "s/$BRE/&/p"
> >                         set +x
> >                 done
> >         )
> >         Example run of above code.  Bug is present where our
> >         STRING echoed into grep/sed fails to appear in the
> >         output:
> >         + echo YYxx
> >         + grep -e Y*\(x\)\1
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y*\(x\)\1/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{0,\}\(x\)\1
> >         + echo YYxx
> >         + sed -ne s/Y\{0,\}\(x\)\1/&/p
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{2,\}\(x\)\1
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y\{2,\}\(x\)\1/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{0,\}\(x\)
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y\{0,\}\(x\)/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{2,\}x
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y\{2,\}x/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{2,\}x\{1,\}
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y\{2,\}x\{1,\}/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxx
> >         + grep -e Y\{2,\}x\{0,\}
> >         YYxx
> >         + echo YYxx
> >         + sed -ne s/Y\{2,\}x\{0,\}/&/p
> >         YYxx
> >         + set +x
> >         + echo YYxxz
> >         + grep -e Y\{2,\}x\{0,\}z
> >         YYxxz
> >         + echo YYxxz
> >         + sed -ne s/Y\{2,\}x\{0,\}z/&/p
> >         YYxxz
> >         + set +x
> >         + echo YYxxz
> >         + grep -e Y\{0,\}x\{0,\}z
> >         YYxxz
> >         + echo YYxxz
> >         + sed -ne s/Y\{0,\}x\{0,\}z/&/p
> >         YYxxz
> >         + set +x
> >         + echo YYxyxy
> >         + grep -e Y\{2,\}\(xy\)\1
> >         YYxyxy
> >         + echo YYxyxy
> >         + sed -ne s/Y\{2,\}\(xy\)\1/&/p
> >         YYxyxy
> >         + set +x
> >         + echo YYxyxy
> >         + grep -e Y\{0,\}\(xy\)\1
> >         + echo YYxyxy
> >         + sed -ne s/Y\{0,\}\(xy\)\1/&/p
> >         + set +x
> >         + echo YYxyxy
> >         + grep -e Y*\(xy\)\1
> >         YYxyxy
> >         + echo YYxyxy
> >         + sed -ne s/Y*\(xy\)\1/&/p
> >         YYxyxy
> >         + set +x
> >         + echo YYxyxy
> >         + grep -e Y\{0,\}\(xy\)xy
> >         YYxyxy
> >         + echo YYxyxy
> >         + sed -ne s/Y\{0,\}\(xy\)xy/&/p
> >         YYxyxy
> >         + set +x
> > > Fix:
> >         No known general work-around
> >
> >
>
> Hi,
>
> I can reproduce on current. Do you have an idea if NetBSD or FreeBSD
> suffer from te same?
>
> -Otto
>

These are the tests incoorporated into our regress tests:

        -Otto

Index: tests
===================================================================
RCS file: /cvs/src/regress/lib/libc/regex/tests,v
retrieving revision 1.9
diff -u -p -r1.9 tests
--- tests 28 Dec 2020 21:41:55 -0000 1.9
+++ tests 2 Apr 2021 14:16:59 -0000
@@ -595,3 +595,18 @@ a?b - ab ab
 # FreeBSD PR 130504
 (.|())(b) - ab ab
 (()|.)(b) - ab ab
+
+# Some BRE cases where \{0,\} makes a backref go wrong, as reported by Michael Paoli
+Y*\(x\)\1 b YYxx YYxx
+Y\{2,\}\(x\)\1 b YYxx YYxx
+# Fails currently
+#Y\{0,\}\(x\)\1 b YYxx YYxx
+Y\{0,\}\(x\) b YYxx YYx
+Y\{2,\}x\{1,\} b YYxx YYxx
+Y\{2,\}x\{0,\}z b YYxxz YYxxz
+Y\{0,\}x\{0,\}z b YYxxz YYxxz
+Y\{2,\}\(xy\)\1 b YYxyxy YYxyxy
+# Fails currently
+#Y\{0,\}\(xy\)\1 b YYxyxy YYxyxy
+Y*\(xy\)\1 b YYxyxy YYxyxy
+Y\{0,\}\(xy\)xy b YYxyxy YYxyxy

Reply | Threaded
Open this post in threaded view
|

Re: library: Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n

Otto Moerbeek
On Fri, Apr 02, 2021 at 04:17:48PM +0200, Otto Moerbeek wrote:

> On Fri, Apr 02, 2021 at 01:57:07PM +0200, Otto Moerbeek wrote:
>
> > On Tue, Feb 23, 2021 at 04:16:09AM -0800, Michael Paoli wrote:
> >
> > > > Synopsis:      Basic Regular Expression (BRE) bug in \{m,n\} with \(\) and \n
> > > > Category:      library
> > > > Environment:
> > >         System      : OpenBSD 6.7
> > >         Details     : OpenBSD 6.7 (GENERIC) #7: Wed Jan  6 15:19:25 MST 2021
> > > [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
> > >
> > >         Architecture: OpenBSD.amd64
> > >         Machine     : amd64
> > > > Description:
> > >         Certain BRE expressions fail/misbehave unexpectedly.
> > >         The failures are the same in both grep and sed (without -E).
> > >         The failures only occur with certain combinations of use of:
> > >         \{\}, \(\), \n (where n is digit) syntax, dropping any one
> > >         of those then generally fails to trigger the bug.
> > >         The bug/error can be seen most clearly in unexpected
> > >         behavior of the \{m,n\} portion in the given context.
> > >         If more of the (apparently dependent) context is removed,
> > >         the bug doesn't show up.  E.g. some of the clearest cases
> > >         involve replacing * with \{0,\} in the BRE, and getting
> > >         quite unexpected results (one would expect the results
> > >         to be the same).  These same BREs work under both
> > >         Solaris 11 and GNU/Linux with their sed and grep.
> > > > How-To-Repeat:
> > >         This example code can be used to illustrate the problem,
> > >         and both show cases where the bug shows up, and also slightly
> > >         differing contexts where the bug does not occur.
> > >         In each of these cases, the output should be the STRING
> > >         we set/echo into grep/sed where we use our BRE, but in the bug
> > >         cases we get no output.
> > >         It's also suggested test cases be added to the code to catch
> > >         possible regression bugs, should issue recur.  :-)
> > >         Example code to show where bug does (and doesn't) show up:
> > >         (
> > >                 exec 2>&1
> > >                 set -- \
> > >                         'YYxx' 'Y*\(x\)\1' \
> > >                         'YYxx' 'Y\{0,\}\(x\)\1' \
> > >                         'YYxx' 'Y\{2,\}\(x\)\1' \
> > >                         'YYxx' 'Y\{0,\}\(x\)' \
> > >                         'YYxx' 'Y\{2,\}x' \
> > >                         'YYxx' 'Y\{2,\}x\{1,\}' \
> > >                         'YYxx' 'Y\{2,\}x\{0,\}' \
> > >                         'YYxxz' 'Y\{2,\}x\{0,\}z' \
> > >                         'YYxxz' 'Y\{0,\}x\{0,\}z' \
> > >                         'YYxyxy' 'Y\{2,\}\(xy\)\1' \
> > >                         'YYxyxy' 'Y\{0,\}\(xy\)\1' \
> > >                         'YYxyxy' 'Y*\(xy\)\1' \
> > >                         'YYxyxy' 'Y\{0,\}\(xy\)xy'
> > >                 while [ "$#" -ge 2 ]
> > >                 do
> > >                         STRING="$1"; shift; BRE="$1"; shift
> > >                         set -x
> > >                         echo "$STRING" | grep -e "$BRE"
> > >                         echo "$STRING" | sed -ne "s/$BRE/&/p"
> > >                         set +x
> > >                 done
> > >         )
> > >         Example run of above code.  Bug is present where our
> > >         STRING echoed into grep/sed fails to appear in the
> > >         output:
> > >         + echo YYxx
> > >         + grep -e Y*\(x\)\1
> > >         YYxx
> > >         + echo YYxx
> > >         + sed -ne s/Y*\(x\)\1/&/p
> > >         YYxx
> > >         + set +x
> > >         + echo YYxx
> > >         + grep -e Y\{0,\}\(x\)\1
> > >         + echo YYxx
> > >         + sed -ne s/Y\{0,\}\(x\)\1/&/p
> > >         + set +x
> > >         + echo YYxx
> > >         + grep -e Y\{2,\}\(x\)\1
> > >         YYxx
> > >         + echo YYxx
> > >         + sed -ne s/Y\{2,\}\(x\)\1/&/p
> > >         YYxx
> > >         + set +x
> > >         + echo YYxx
> > >         + grep -e Y\{0,\}\(x\)
> > >         YYxx
> > >         + echo YYxx
> > >         + sed -ne s/Y\{0,\}\(x\)/&/p
> > >         YYxx
> > >         + set +x
> > >         + echo YYxx
> > >         + grep -e Y\{2,\}x
> > >         YYxx
> > >         + echo YYxx
> > >         + sed -ne s/Y\{2,\}x/&/p
> > >         YYxx
> > >         + set +x
> > >         + echo YYxx
> > >         + grep -e Y\{2,\}x\{1,\}
> > >         YYxx
> > >         + echo YYxx
> > >         + sed -ne s/Y\{2,\}x\{1,\}/&/p
> > >         YYxx
> > >         + set +x
> > >         + echo YYxx
> > >         + grep -e Y\{2,\}x\{0,\}
> > >         YYxx
> > >         + echo YYxx
> > >         + sed -ne s/Y\{2,\}x\{0,\}/&/p
> > >         YYxx
> > >         + set +x
> > >         + echo YYxxz
> > >         + grep -e Y\{2,\}x\{0,\}z
> > >         YYxxz
> > >         + echo YYxxz
> > >         + sed -ne s/Y\{2,\}x\{0,\}z/&/p
> > >         YYxxz
> > >         + set +x
> > >         + echo YYxxz
> > >         + grep -e Y\{0,\}x\{0,\}z
> > >         YYxxz
> > >         + echo YYxxz
> > >         + sed -ne s/Y\{0,\}x\{0,\}z/&/p
> > >         YYxxz
> > >         + set +x
> > >         + echo YYxyxy
> > >         + grep -e Y\{2,\}\(xy\)\1
> > >         YYxyxy
> > >         + echo YYxyxy
> > >         + sed -ne s/Y\{2,\}\(xy\)\1/&/p
> > >         YYxyxy
> > >         + set +x
> > >         + echo YYxyxy
> > >         + grep -e Y\{0,\}\(xy\)\1
> > >         + echo YYxyxy
> > >         + sed -ne s/Y\{0,\}\(xy\)\1/&/p
> > >         + set +x
> > >         + echo YYxyxy
> > >         + grep -e Y*\(xy\)\1
> > >         YYxyxy
> > >         + echo YYxyxy
> > >         + sed -ne s/Y*\(xy\)\1/&/p
> > >         YYxyxy
> > >         + set +x
> > >         + echo YYxyxy
> > >         + grep -e Y\{0,\}\(xy\)xy
> > >         YYxyxy
> > >         + echo YYxyxy
> > >         + sed -ne s/Y\{0,\}\(xy\)xy/&/p
> > >         YYxyxy
> > >         + set +x
> > > > Fix:
> > >         No known general work-around
> > >
> > >
> >
> > Hi,
> >
> > I can reproduce on current. Do you have an idea if NetBSD or FreeBSD
> > suffer from te same?
> >
> > -Otto
> >
>
> These are the tests incoorporated into our regress tests:
>
> -Otto
>
> Index: tests
> ===================================================================
> RCS file: /cvs/src/regress/lib/libc/regex/tests,v
> retrieving revision 1.9
> diff -u -p -r1.9 tests
> --- tests 28 Dec 2020 21:41:55 -0000 1.9
> +++ tests 2 Apr 2021 14:16:59 -0000
> @@ -595,3 +595,18 @@ a?b - ab ab
>  # FreeBSD PR 130504
>  (.|())(b) - ab ab
>  (()|.)(b) - ab ab
> +
> +# Some BRE cases where \{0,\} makes a backref go wrong, as reported by Michael Paoli
> +Y*\(x\)\1 b YYxx YYxx
> +Y\{2,\}\(x\)\1 b YYxx YYxx
> +# Fails currently
> +#Y\{0,\}\(x\)\1 b YYxx YYxx
> +Y\{0,\}\(x\) b YYxx YYx
> +Y\{2,\}x\{1,\} b YYxx YYxx
> +Y\{2,\}x\{0,\}z b YYxxz YYxxz
> +Y\{0,\}x\{0,\}z b YYxxz YYxxz
> +Y\{2,\}\(xy\)\1 b YYxyxy YYxyxy
> +# Fails currently
> +#Y\{0,\}\(xy\)\1 b YYxyxy YYxyxy
> +Y*\(xy\)\1 b YYxyxy YYxyxy
> +Y\{0,\}\(xy\)xy b YYxyxy YYxyxy
>

State of my investigation so far.

1. FreeBSD has the bug as well.

2. When backrefs are used, the re engine takes a different path from
the ordinary, this is reflected in the remarks in the BUGS section at
the end of re_format(5).

3. The engine code regcomp produces to match \{0,N\} uses an OR
instruction.

4. My suspicion is that the backref code does not handle those
correctly.

5. But since backrefs are extremnely ugly and slow (see again the BUGS
section), I don't think I'm very motivated to fix this.

        -Otto