grep ".one\|.two" doesn't work on OpenBSD. Is it expected?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

grep ".one\|.two" doesn't work on OpenBSD. Is it expected?

Juan Francisco Cantero Hurtado
I've a test in one of my ports similar to this:

$ cat test.txt
$TESTTMP/hgcache/master/packs/7bcd2d90b99395ca43172a0dd24e18860b2902f9.histpack
$TESTTMP/hgcache/master/packs/dc8f8fdc76690ce27791ce9f53a18da379e50d37.datapack
$ cat test.txt | grep ".datapack\|.histpack"
$ cat test.txt | ggrep ".datapack\|.histpack"
$TESTTMP/hgcache/master/packs/7bcd2d90b99395ca43172a0dd24e18860b2902f9.histpack
$TESTTMP/hgcache/master/packs/dc8f8fdc76690ce27791ce9f53a18da379e50d37.datapack

The grep command works with GNU, NetBSD, FreeBSD and BusyBox. It fails
on OpenBSD and Solaris 11. I'm suggesting upstream to change the command
to "grep -e ".datapack" -e ".histpack"" but I would like to know if this
is a bug or we just don't support the | in the grep patterns.

Cheers.


--
Juan Francisco Cantero Hurtado http://juanfra.info

Reply | Threaded
Open this post in threaded view
|

Re: grep ".one\|.two" doesn't work on OpenBSD. Is it expected?

Todd C. Miller-3
On Mon, 20 May 2019 20:01:12 +0200, Juan Francisco Cantero Hurtado wrote:

> The grep command works with GNU, NetBSD, FreeBSD and BusyBox. It fails
> on OpenBSD and Solaris 11. I'm suggesting upstream to change the command
> to "grep -e ".datapack" -e ".histpack"" but I would like to know if this
> is a bug or we just don't support the | in the grep patterns.

That's GNU regexp format, not POSIX.  The standard way to do this
is with an extended regular expression using egrep.  E.g.

    cat test.txt | egrep ".datapack|.histpack"

though you should escape the '.' if you want it to match literally.

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: grep ".one\|.two" doesn't work on OpenBSD. Is it expected?

Paul de Weerd
In reply to this post by Juan Francisco Cantero Hurtado
On Mon, May 20, 2019 at 08:01:12PM +0200, Juan Francisco Cantero Hurtado wrote:
| I've a test in one of my ports similar to this:
|
| $ cat test.txt
| $TESTTMP/hgcache/master/packs/7bcd2d90b99395ca43172a0dd24e18860b2902f9.histpack
| $TESTTMP/hgcache/master/packs/dc8f8fdc76690ce27791ce9f53a18da379e50d37.datapack
| $ cat test.txt | grep ".datapack\|.histpack"
| $ cat test.txt | ggrep ".datapack\|.histpack"
| $TESTTMP/hgcache/master/packs/7bcd2d90b99395ca43172a0dd24e18860b2902f9.histpack
| $TESTTMP/hgcache/master/packs/dc8f8fdc76690ce27791ce9f53a18da379e50d37.datapack
|
| The grep command works with GNU, NetBSD, FreeBSD and BusyBox. It fails
| on OpenBSD and Solaris 11. I'm suggesting upstream to change the command
| to "grep -e ".datapack" -e ".histpack"" but I would like to know if this
| is a bug or we just don't support the | in the grep patterns.

Try grep -E, or egrep, for extended regular expression matching:

[weerd@pom] $ cat sample
a
b
c
[weerd@pom] $ grep 'a|b' sample
[weerd@pom] $ grep -E 'a|b' sample
a
b

The standard grep(1) defaults to "basic" Regular Expressions, whereas
the branch-feature is part of Extended Regular Expressions (ERE).  See
re_format(7) for details.

Cheers,

Paul 'WEiRD' de Weerd

--
>++++++++[<++++++++++>-]<+++++++.>+++[<------>-]<.>+++[<+
+++++++++++>-]<.>++[<------------>-]<+.--------------.[-]
                 http://www.weirdnet.nl/                 

Reply | Threaded
Open this post in threaded view
|

Re: grep ".one\|.two" doesn't work on OpenBSD. Is it expected?

Juan Francisco Cantero Hurtado
In reply to this post by Todd C. Miller-3
On Mon, May 20, 2019 at 01:22:21PM -0600, Todd C. Miller wrote:

> On Mon, 20 May 2019 20:01:12 +0200, Juan Francisco Cantero Hurtado wrote:
>
> > The grep command works with GNU, NetBSD, FreeBSD and BusyBox. It fails
> > on OpenBSD and Solaris 11. I'm suggesting upstream to change the command
> > to "grep -e ".datapack" -e ".histpack"" but I would like to know if this
> > is a bug or we just don't support the | in the grep patterns.
>
> That's GNU regexp format, not POSIX.  The standard way to do this
> is with an extended regular expression using egrep.  E.g.
>
>     cat test.txt | egrep ".datapack|.histpack"
>
> though you should escape the '.' if you want it to match literally.

Yes, that worked. Many thanks for the help guys.


--
Juan Francisco Cantero Hurtado http://juanfra.info

Reply | Threaded
Open this post in threaded view
|

Re: grep ".one\|.two" doesn't work on OpenBSD. Is it expected?

Ingo Schwarze
In reply to this post by Todd C. Miller-3
Hi,

Todd C. Miller wrote on Mon, May 20, 2019 at 01:22:21PM -0600:
> On Mon, 20 May 2019 20:01:12 +0200, Juan Francisco Cantero Hurtado wrote:

>> The grep command works with GNU, NetBSD, FreeBSD and BusyBox. It fails
>> on OpenBSD and Solaris 11. I'm suggesting upstream to change the command
>> to "grep -e ".datapack" -e ".histpack"" but I would like to know if this
>> is a bug or we just don't support the | in the grep patterns.

> That's GNU regexp format, not POSIX.  The standard way to do this
> is with an extended regular expression using egrep.  E.g.
>
>     cat test.txt | egrep ".datapack|.histpack"
>
> though you should escape the '.' if you want it to match literally.

I just checked whether the meaning of "\|" is properly documented,
and it is:

  re_format(7):
     POSIX leaves some aspects of RE syntax and semantics open;
     '**' marks decisions on these aspects that may not be fully
     portable to other POSIX implementations.
     [...]
     An atom is [...]
     a '\' followed by one of the characters '^.[$()|*+?{\' (matching
     that character taken as an ordinary character, as if the '\'
     had not been present**), [...]

So, there is nothing to fix in the manual, and besides, our choice
does not violate POSIX:

  https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03

  9.3.2 BRE Ordinary Characters

  An ordinary character is a BRE that matches itself: any character
  in the supported character set, except for the BRE special
  characters listed in BRE Special Characters.

  The interpretation of an ordinary character preceded by an unescaped
  <backslash> ( '\\' ) is undefined, except for:  [...]

Since '|' is an ordinary character, "\|" causes undefined behaviour.

Yours,
  Ingo