sed behavior

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

sed behavior

Nils Anspach
Hello list,

I have an issue with sed. Why does

        echo 'ab' | sed -E 's/a|$/x/g'

give 'x' whereas

        echo 'ab' | perl -e 'my $var = <>; $var =~ s/a|$/x/g; print $var'

and GNU's sed yield 'xbx' (which I expected to be the result in the
first case)?

Thanks
Nils

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Ingo Schwarze
Hi Nils,

Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:

> I have an issue with sed. Why does
>
> echo 'ab' | sed -E 's/a|$/x/g'
>
> give 'x' whereas

I sense a bug here.
Tracing a bit around process(),
it looks like the first application of the s command
yields dst = "x" continue_to_process = "b\n",
and then the second application
appends "\n" to dst (should rather append "b", i think).
Maybe something is wrong here with character/pointer counting,
but i'm somewhat out of time now for tracing.

This is worth more investigation.

Yours,
  Ingo

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

sven falempin
Hello,

Indeed there is a small problem:

# echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
xbbbbbbbbbbbbbfffff
# echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
x

String modification is done inside the 'case 0:'
substitute(struct s_command *cp) in src/usr.bin/process.c

But the problem may comme from regexec_e.

Maybe openbsd devs should test another regexp code version ?

Hope it helps,
Who still use sed anyway :)

Regards.

2011/6/12 Ingo Schwarze <[hidden email]>

> Hi Nils,
>
> Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
>
> > I have an issue with sed. Why does
> >
> >       echo 'ab' | sed -E 's/a|$/x/g'
> >
> > give 'x' whereas
>
> I sense a bug here.
> Tracing a bit around process(),
> it looks like the first application of the s command
> yields dst = "x" continue_to_process = "b\n",
> and then the second application
> appends "\n" to dst (should rather append "b", i think).
> Maybe something is wrong here with character/pointer counting,
> but i'm somewhat out of time now for tracing.
>
> This is worth more investigation.
>
> Yours,
>   Ingo
>
>


--
---------------------------------------------------------------------------------------------------------------------
() ascii ribbon campaign - against html e-mail
/\

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Otto Moerbeek
On Tue, Jun 14, 2011 at 11:56:27PM +0200, sven falempin wrote:

> Hello,
>
> Indeed there is a small problem:
>
> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
> xbbbbbbbbbbbbbfffff

That is expected. $ is only special when it ocurs as the list char of
a re.

> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
> x

This is likely to be a real bug.

>
> String modification is done inside the 'case 0:'
> substitute(struct s_command *cp) in src/usr.bin/process.c
>
> But the problem may comme from regexec_e.
>
> Maybe openbsd devs should test another regexp code version ?

Why? If we should change libs on every bug encountered, nothing will
be left.

Anyway, thanks for the report.

        -Otto

>
> Hope it helps,
> Who still use sed anyway :)
>
> Regards.
>
> 2011/6/12 Ingo Schwarze <[hidden email]>
>
> > Hi Nils,
> >
> > Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
> >
> > > I have an issue with sed. Why does
> > >
> > >       echo 'ab' | sed -E 's/a|$/x/g'
> > >
> > > give 'x' whereas
> >
> > I sense a bug here.
> > Tracing a bit around process(),
> > it looks like the first application of the s command
> > yields dst = "x" continue_to_process = "b\n",
> > and then the second application
> > appends "\n" to dst (should rather append "b", i think).
> > Maybe something is wrong here with character/pointer counting,
> > but i'm somewhat out of time now for tracing.
> >
> > This is worth more investigation.
> >
> > Yours,
> >   Ingo
> >
> >
>
>
> --
> ---------------------------------------------------------------------------------------------------------------------
> () ascii ribbon campaign - against html e-mail
> /\

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Otto Moerbeek
On Wed, Jun 15, 2011 at 07:44:20AM +0200, Otto Moerbeek wrote:

> On Tue, Jun 14, 2011 at 11:56:27PM +0200, sven falempin wrote:
>
> > Hello,
> >
> > Indeed there is a small problem:
> >
> > # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
> > xbbbbbbbbbbbbbfffff
>
> That is expected. $ is only special when it ocurs as the list char of
> a re.
>
> > # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
> > x
>
> This is likely to be a real bug.
>
> >
> > String modification is done inside the 'case 0:'
> > substitute(struct s_command *cp) in src/usr.bin/process.c
> >
> > But the problem may comme from regexec_e.
> >
> > Maybe openbsd devs should test another regexp code version ?
>
> Why? If we should change libs on every bug encountered, nothing will
> be left.
>
> Anyway, thanks for the report.
>
> -Otto
>
> >
> > Hope it helps,
> > Who still use sed anyway :)
> >
> > Regards.
> >
> > 2011/6/12 Ingo Schwarze <[hidden email]>
> >
> > > Hi Nils,
> > >
> > > Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
> > >
> > > > I have an issue with sed. Why does
> > > >
> > > >       echo 'ab' | sed -E 's/a|$/x/g'
> > > >
> > > > give 'x' whereas
> > >
> > > I sense a bug here.
> > > Tracing a bit around process(),
> > > it looks like the first application of the s command
> > > yields dst = "x" continue_to_process = "b\n",
> > > and then the second application
> > > appends "\n" to dst (should rather append "b", i think).
> > > Maybe something is wrong here with character/pointer counting,
> > > but i'm somewhat out of time now for tracing.
> > >
> > > This is worth more investigation.
> > >
> > > Yours,
> > >   Ingo
> > >
> > >
> >
> >
> > --
> > ---------------------------------------------------------------------------------------------------------------------
> > () ascii ribbon campaign - against html e-mail
> > /\

This dif fixes your problem here. Big question is of course: does it
break other cases?

        -Otto

Index: process.c
===================================================================
RCS file: /cvs/src/usr.bin/sed/process.c,v
retrieving revision 1.15
diff -u -p -r1.15 process.c
--- process.c 27 Oct 2009 23:59:43 -0000 1.15
+++ process.c 15 Jun 2011 06:31:08 -0000
@@ -336,7 +336,9 @@ substitute(struct s_command *cp)
  switch (n) {
  case 0: /* Global */
  do {
- if (lastempty || match[0].rm_so != match[0].rm_eo) {
+ if (lastempty || match[0].rm_so != match[0].rm_eo ||
+    (match[0].rm_so == match[0].rm_eo &&
+    match[0].rm_so > 0)) {
  /* Locate start of replaced string. */
  re_off = match[0].rm_so;
  /* Copy leading retained string. */

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Alexander Hall-3
On 06/15/11 08:35, Otto Moerbeek wrote:

> On Wed, Jun 15, 2011 at 07:44:20AM +0200, Otto Moerbeek wrote:
>
>> On Tue, Jun 14, 2011 at 11:56:27PM +0200, sven falempin wrote:
>>
>>> Hello,
>>>
>>> Indeed there is a small problem:
>>>
>>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
>>> xbbbbbbbbbbbbbfffff
>>
>> That is expected. $ is only special when it ocurs as the list char of
>> a re.
>>
>>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
>>> x
>>
>> This is likely to be a real bug.
>>
>>>
>>> String modification is done inside the 'case 0:'
>>> substitute(struct s_command *cp) in src/usr.bin/process.c
>>>
>>> But the problem may comme from regexec_e.
>>>
>>> Maybe openbsd devs should test another regexp code version ?
>>
>> Why? If we should change libs on every bug encountered, nothing will
>> be left.
>>
>> Anyway, thanks for the report.
>>
>> -Otto
>>
>>>
>>> Hope it helps,
>>> Who still use sed anyway :)
>>>
>>> Regards.
>>>
>>> 2011/6/12 Ingo Schwarze <[hidden email]>
>>>
>>>> Hi Nils,
>>>>
>>>> Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
>>>>
>>>>> I have an issue with sed. Why does
>>>>>
>>>>>       echo 'ab' | sed -E 's/a|$/x/g'
>>>>>
>>>>> give 'x' whereas
>>>>
>>>> I sense a bug here.
>>>> Tracing a bit around process(),
>>>> it looks like the first application of the s command
>>>> yields dst = "x" continue_to_process = "b\n",
>>>> and then the second application
>>>> appends "\n" to dst (should rather append "b", i think).
>>>> Maybe something is wrong here with character/pointer counting,
>>>> but i'm somewhat out of time now for tracing.
>>>>
>>>> This is worth more investigation.
>>>>
>>>> Yours,
>>>>   Ingo
>>>>
>>>>
>>>
>>>
>>> --
>>> ---------------------------------------------------------------------------------------------------------------------
>>> () ascii ribbon campaign - against html e-mail
>>> /\
>
> This dif fixes your problem here. Big question is of course: does it
> break other cases?
>
> -Otto
>
> Index: process.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/process.c,v
> retrieving revision 1.15
> diff -u -p -r1.15 process.c
> --- process.c 27 Oct 2009 23:59:43 -0000 1.15
> +++ process.c 15 Jun 2011 06:31:08 -0000
> @@ -336,7 +336,9 @@ substitute(struct s_command *cp)
>   switch (n) {
>   case 0: /* Global */
>   do {
> - if (lastempty || match[0].rm_so != match[0].rm_eo) {
> + if (lastempty || match[0].rm_so != match[0].rm_eo ||
> +    (match[0].rm_so == match[0].rm_eo &&
> +    match[0].rm_so > 0)) {
>   /* Locate start of replaced string. */
>   re_off = match[0].rm_so;
>   /* Copy leading retained string. */
>

Looks ok to me (I believe the problem was that prior to the intodution
of -E, any matching '$' would always be in the first match.

The diff doesn't break any of the regression tests (not that there are
a lot of them). While at it, here's another one! :-)

/Alexander


Index: Makefile
===================================================================
RCS file: /cvs/src/regress/usr.bin/sed/Makefile,v
retrieving revision 1.2
diff -u -p -r1.2 Makefile
--- Makefile 13 Oct 2008 13:27:33 -0000 1.2
+++ Makefile 15 Jun 2011 06:55:26 -0000
@@ -3,7 +3,7 @@

 SED= /usr/bin/sed

-REGRESS_TARGETS= sedtest hanoi math sierpinski
+REGRESS_TARGETS= sedtest hanoi math sierpinski eol

 sedtest:
  sh ${.CURDIR}/$@.sh ${SED} $@.out
@@ -19,6 +19,10 @@ math:

 sierpinski:
  ${SED} -nf ${.CURDIR}/$@.sed ${.CURDIR}/$@.in > $@.out
+ diff ${.CURDIR}/$@.expected $@.out
+
+eol:
+ ${SED} -Ef ${.CURDIR}/$@.sed ${.CURDIR}/$@.in > $@.out
  diff ${.CURDIR}/$@.expected $@.out

 CLEANFILES+=*.out lines* script*
Index: eol.expected
===================================================================
RCS file: eol.expected
diff -N eol.expected
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ eol.expected 15 Jun 2011 06:55:26 -0000
@@ -0,0 +1 @@
+xxxxbbbbccccx
Index: eol.in
===================================================================
RCS file: eol.in
diff -N eol.in
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ eol.in 15 Jun 2011 06:55:26 -0000
@@ -0,0 +1 @@
+aaaabbbbcccc
Index: eol.sed
===================================================================
RCS file: eol.sed
diff -N eol.sed
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ eol.sed 15 Jun 2011 06:55:26 -0000
@@ -0,0 +1 @@
+s/a|$/x/g

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Alexander Hall-3
On 06/15/11 08:58, Alexander Hall wrote:

> On 06/15/11 08:35, Otto Moerbeek wrote:
>> On Wed, Jun 15, 2011 at 07:44:20AM +0200, Otto Moerbeek wrote:
>>
>>> On Tue, Jun 14, 2011 at 11:56:27PM +0200, sven falempin wrote:
>>>
>>>> Hello,
>>>>
>>>> Indeed there is a small problem:
>>>>
>>>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
>>>> xbbbbbbbbbbbbbfffff
>>>
>>> That is expected. $ is only special when it ocurs as the list char of
>>> a re.
>>>
>>>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
>>>> x
>>>
>>> This is likely to be a real bug.
>>>
>>>>
>>>> String modification is done inside the 'case 0:'
>>>> substitute(struct s_command *cp) in src/usr.bin/process.c
>>>>
>>>> But the problem may comme from regexec_e.
>>>>
>>>> Maybe openbsd devs should test another regexp code version ?
>>>
>>> Why? If we should change libs on every bug encountered, nothing will
>>> be left.
>>>
>>> Anyway, thanks for the report.
>>>
>>> -Otto
>>>
>>>>
>>>> Hope it helps,
>>>> Who still use sed anyway :)
>>>>
>>>> Regards.
>>>>
>>>> 2011/6/12 Ingo Schwarze <[hidden email]>
>>>>
>>>>> Hi Nils,
>>>>>
>>>>> Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
>>>>>
>>>>>> I have an issue with sed. Why does
>>>>>>
>>>>>>       echo 'ab' | sed -E 's/a|$/x/g'
>>>>>>
>>>>>> give 'x' whereas
>>>>>
>>>>> I sense a bug here.
>>>>> Tracing a bit around process(),
>>>>> it looks like the first application of the s command
>>>>> yields dst = "x" continue_to_process = "b\n",
>>>>> and then the second application
>>>>> appends "\n" to dst (should rather append "b", i think).
>>>>> Maybe something is wrong here with character/pointer counting,
>>>>> but i'm somewhat out of time now for tracing.
>>>>>
>>>>> This is worth more investigation.
>>>>>
>>>>> Yours,
>>>>>   Ingo
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ---------------------------------------------------------------------------------------------------------------------
>>>> () ascii ribbon campaign - against html e-mail
>>>> /\
>>
>> This dif fixes your problem here. Big question is of course: does it
>> break other cases?
>>
>> -Otto
>>
>> Index: process.c
>> ===================================================================
>> RCS file: /cvs/src/usr.bin/sed/process.c,v
>> retrieving revision 1.15
>> diff -u -p -r1.15 process.c
>> --- process.c 27 Oct 2009 23:59:43 -0000 1.15
>> +++ process.c 15 Jun 2011 06:31:08 -0000
>> @@ -336,7 +336,9 @@ substitute(struct s_command *cp)
>>   switch (n) {
>>   case 0: /* Global */
>>   do {
>> - if (lastempty || match[0].rm_so != match[0].rm_eo) {
>> + if (lastempty || match[0].rm_so != match[0].rm_eo ||
>> +    (match[0].rm_so == match[0].rm_eo &&
>> +    match[0].rm_so > 0)) {
>>   /* Locate start of replaced string. */
>>   re_off = match[0].rm_so;
>>   /* Copy leading retained string. */
>>
>
> Looks ok to me (I believe the problem was that prior to the intodution
> of -E, any matching '$' would always be in the first match.
>
> The diff doesn't break any of the regression tests (not that there are
> a lot of them). While at it, here's another one! :-)

Hmmm, looking closer I guess this should rather be a part of the sedtest
target...

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Alexander Hall-3
In reply to this post by Otto Moerbeek
On 06/15/11 08:35, Otto Moerbeek wrote:

> On Wed, Jun 15, 2011 at 07:44:20AM +0200, Otto Moerbeek wrote:
>
>> On Tue, Jun 14, 2011 at 11:56:27PM +0200, sven falempin wrote:
>>
>>> Hello,
>>>
>>> Indeed there is a small problem:
>>>
>>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
>>> xbbbbbbbbbbbbbfffff
>>
>> That is expected. $ is only special when it ocurs as the list char of
>> a re.
>>
>>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
>>> x
>>
>> This is likely to be a real bug.
>>
>>>
>>> String modification is done inside the 'case 0:'
>>> substitute(struct s_command *cp) in src/usr.bin/process.c
>>>
>>> But the problem may comme from regexec_e.
>>>
>>> Maybe openbsd devs should test another regexp code version ?
>>
>> Why? If we should change libs on every bug encountered, nothing will
>> be left.
>>
>> Anyway, thanks for the report.
>>
>> -Otto
>>
>>>
>>> Hope it helps,
>>> Who still use sed anyway :)
>>>
>>> Regards.
>>>
>>> 2011/6/12 Ingo Schwarze <[hidden email]>
>>>
>>>> Hi Nils,
>>>>
>>>> Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
>>>>
>>>>> I have an issue with sed. Why does
>>>>>
>>>>>       echo 'ab' | sed -E 's/a|$/x/g'
>>>>>
>>>>> give 'x' whereas
>>>>
>>>> I sense a bug here.
>>>> Tracing a bit around process(),
>>>> it looks like the first application of the s command
>>>> yields dst = "x" continue_to_process = "b\n",
>>>> and then the second application
>>>> appends "\n" to dst (should rather append "b", i think).
>>>> Maybe something is wrong here with character/pointer counting,
>>>> but i'm somewhat out of time now for tracing.
>>>>
>>>> This is worth more investigation.
>>>>
>>>> Yours,
>>>>   Ingo
>>>>
>>>>
>>>
>>>
>>> --
>>> ---------------------------------------------------------------------------------------------------------------------
>>> () ascii ribbon campaign - against html e-mail
>>> /\
>
> This dif fixes your problem here. Big question is of course: does it
> break other cases?

It differs from perl like this:

$ echo 'l1_1' | perl -pe 's/1|$/X/g'
lX_XX
$ echo 'l1_1' | sed -E 's/1|$/X/g'
lX_X

Meaning we don't hit that final '$' if the last match went to eol.

/Alexander

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

sven falempin
Hi,

So what 'sed' shall do when  match[0].rm_so i==0 ?

Regards.

2011/6/15 Alexander Hall <[hidden email]>

> On 06/15/11 08:35, Otto Moerbeek wrote:
> > On Wed, Jun 15, 2011 at 07:44:20AM +0200, Otto Moerbeek wrote:
> >
> >> On Tue, Jun 14, 2011 at 11:56:27PM +0200, sven falempin wrote:
> >>
> >>> Hello,
> >>>
> >>> Indeed there is a small problem:
> >>>
> >>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
> >>> xbbbbbbbbbbbbbfffff
> >>
> >> That is expected. $ is only special when it ocurs as the list char of
> >> a re.
> >>
> >>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
> >>> x
> >>
> >> This is likely to be a real bug.
> >>
> >>>
> >>> String modification is done inside the 'case 0:'
> >>> substitute(struct s_command *cp) in src/usr.bin/process.c
> >>>
> >>> But the problem may comme from regexec_e.
> >>>
> >>> Maybe openbsd devs should test another regexp code version ?
> >>
> >> Why? If we should change libs on every bug encountered, nothing will
> >> be left.
> >>
> >> Anyway, thanks for the report.
> >>
> >>      -Otto
> >>
> >>>
> >>> Hope it helps,
> >>> Who still use sed anyway :)
> >>>
> >>> Regards.
> >>>
> >>> 2011/6/12 Ingo Schwarze <[hidden email]>
> >>>
> >>>> Hi Nils,
> >>>>
> >>>> Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
> >>>>
> >>>>> I have an issue with sed. Why does
> >>>>>
> >>>>>       echo 'ab' | sed -E 's/a|$/x/g'
> >>>>>
> >>>>> give 'x' whereas
> >>>>
> >>>> I sense a bug here.
> >>>> Tracing a bit around process(),
> >>>> it looks like the first application of the s command
> >>>> yields dst = "x" continue_to_process = "b\n",
> >>>> and then the second application
> >>>> appends "\n" to dst (should rather append "b", i think).
> >>>> Maybe something is wrong here with character/pointer counting,
> >>>> but i'm somewhat out of time now for tracing.
> >>>>
> >>>> This is worth more investigation.
> >>>>
> >>>> Yours,
> >>>>   Ingo
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>>
> ---------------------------------------------------------------------------------------------------------------------
> >>> () ascii ribbon campaign - against html e-mail
> >>> /\
> >
> > This dif fixes your problem here. Big question is of course: does it
> > break other cases?
>
> It differs from perl like this:
>
> $ echo 'l1_1' | perl -pe 's/1|$/X/g'
> lX_XX
> $ echo 'l1_1' | sed -E 's/1|$/X/g'
> lX_X
>
> Meaning we don't hit that final '$' if the last match went to eol.
>
> /Alexander
>



--
---------------------------------------------------------------------------------------------------------------------
() ascii ribbon campaign - against html e-mail
/\

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Otto Moerbeek
In reply to this post by Alexander Hall-3
On Wed, Jun 15, 2011 at 09:50:29AM +0200, Alexander Hall wrote:

> On 06/15/11 08:35, Otto Moerbeek wrote:
> > On Wed, Jun 15, 2011 at 07:44:20AM +0200, Otto Moerbeek wrote:
> >
> >> On Tue, Jun 14, 2011 at 11:56:27PM +0200, sven falempin wrote:
> >>
> >>> Hello,
> >>>
> >>> Indeed there is a small problem:
> >>>
> >>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
> >>> xbbbbbbbbbbbbbfffff
> >>
> >> That is expected. $ is only special when it ocurs as the list char of
> >> a re.
> >>
> >>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
> >>> x
> >>
> >> This is likely to be a real bug.
> >>
> >>>
> >>> String modification is done inside the 'case 0:'
> >>> substitute(struct s_command *cp) in src/usr.bin/process.c
> >>>
> >>> But the problem may comme from regexec_e.
> >>>
> >>> Maybe openbsd devs should test another regexp code version ?
> >>
> >> Why? If we should change libs on every bug encountered, nothing will
> >> be left.
> >>
> >> Anyway, thanks for the report.
> >>
> >> -Otto
> >>
> >>>
> >>> Hope it helps,
> >>> Who still use sed anyway :)
> >>>
> >>> Regards.
> >>>
> >>> 2011/6/12 Ingo Schwarze <[hidden email]>
> >>>
> >>>> Hi Nils,
> >>>>
> >>>> Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
> >>>>
> >>>>> I have an issue with sed. Why does
> >>>>>
> >>>>>       echo 'ab' | sed -E 's/a|$/x/g'
> >>>>>
> >>>>> give 'x' whereas
> >>>>
> >>>> I sense a bug here.
> >>>> Tracing a bit around process(),
> >>>> it looks like the first application of the s command
> >>>> yields dst = "x" continue_to_process = "b\n",
> >>>> and then the second application
> >>>> appends "\n" to dst (should rather append "b", i think).
> >>>> Maybe something is wrong here with character/pointer counting,
> >>>> but i'm somewhat out of time now for tracing.
> >>>>
> >>>> This is worth more investigation.
> >>>>
> >>>> Yours,
> >>>>   Ingo
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> ---------------------------------------------------------------------------------------------------------------------
> >>> () ascii ribbon campaign - against html e-mail
> >>> /\
> >
> > This dif fixes your problem here. Big question is of course: does it
> > break other cases?
>
> It differs from perl like this:
>
> $ echo 'l1_1' | perl -pe 's/1|$/X/g'
> lX_XX
> $ echo 'l1_1' | sed -E 's/1|$/X/g'
> lX_X
>
> Meaning we don't hit that final '$' if the last match went to eol.
>
> /Alexander

Right.

I took a look at freebsd, thay have some patches in this area. But
applying the changes did not have the desired effect. Have to look deeper.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Otto Moerbeek
On Wed, Jun 15, 2011 at 12:17:09PM +0200, Otto Moerbeek wrote:

> > It differs from perl like this:
> >
> > $ echo 'l1_1' | perl -pe 's/1|$/X/g'
> > lX_XX
> > $ echo 'l1_1' | sed -E 's/1|$/X/g'
> > lX_X
> >
> > Meaning we don't hit that final '$' if the last match went to eol.
> >
> > /Alexander
>
> Right.
>
> I took a look at freebsd, thay have some patches in this area. But
> applying the changes did not have the desired effect. Have to look deeper.
>
> -Otto

One conclusion: if I port the complete sed from freebsd, it also does
not substitute the eol in your testcase. So no wonder incorporating
the changes into OpenBSD does not fix this case.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Otto Moerbeek
In reply to this post by sven falempin
On Wed, Jun 15, 2011 at 12:02:44PM +0200, sven falempin wrote:

> Hi,
>
> So what 'sed' shall do when  match[0].rm_so i==0 ?

lastempty is true in that case.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Nicholas Marriott-2
In reply to this post by Otto Moerbeek
FWIW GNU sed also appears to have the same behaviour (different from
perl).



On Wed, Jun 15, 2011 at 12:26:39PM +0200, Otto Moerbeek wrote:

> On Wed, Jun 15, 2011 at 12:17:09PM +0200, Otto Moerbeek wrote:
>
> > > It differs from perl like this:
> > >
> > > $ echo 'l1_1' | perl -pe 's/1|$/X/g'
> > > lX_XX
> > > $ echo 'l1_1' | sed -E 's/1|$/X/g'
> > > lX_X
> > >
> > > Meaning we don't hit that final '$' if the last match went to eol.
> > >
> > > /Alexander
> >
> > Right.
> >
> > I took a look at freebsd, thay have some patches in this area. But
> > applying the changes did not have the desired effect. Have to look deeper.
> >
> > -Otto
>
> One conclusion: if I port the complete sed from freebsd, it also does
> not substitute the eol in your testcase. So no wonder incorporating
> the changes into OpenBSD does not fix this case.
>
> -Otto

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Stuart Henderson
In reply to this post by Alexander Hall-3
On 2011-06-15, Alexander Hall <[hidden email]> wrote:

>> This dif fixes your problem here. Big question is of course: does it
>> break other cases?
>
> It differs from perl like this:
>
> $ echo 'l1_1' | perl -pe 's/1|$/X/g'
> lX_XX
> $ echo 'l1_1' | sed -E 's/1|$/X/g'
> lX_X
>
> Meaning we don't hit that final '$' if the last match went to eol.

perl regular expressions are expected to behave differently
to posix REs.

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

pdcvgmh
In reply to this post by Otto Moerbeek
On Wed, Jun 15, 2011 at 7:17 AM, Otto Moerbeek <[hidden email]> wrote:

> On Wed, Jun 15, 2011 at 09:50:29AM +0200, Alexander Hall wrote:
>
>> On 06/15/11 08:35, Otto Moerbeek wrote:
>> > On Wed, Jun 15, 2011 at 07:44:20AM +0200, Otto Moerbeek wrote:
>> >
>> >> On Tue, Jun 14, 2011 at 11:56:27PM +0200, sven falempin wrote:
>> >>
>> >>> Hello,
>> >>>
>> >>> Indeed there is a small problem:
>> >>>
>> >>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
>> >>> xbbbbbbbbbbbbbfffff
>> >>
>> >> That is expected. $ is only special when it ocurs as the list char of
>> >> a re.
>> >>
>> >>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
>> >>> x
>> >>
>> >> This is likely to be a real bug.
>> >>
>> >>>
>> >>> String modification is done inside the 'case 0:'
>> >>> substitute(struct s_command *cp) in src/usr.bin/process.c
>> >>>
>> >>> But the problem may comme from regexec_e.
>> >>>
>> >>> Maybe openbsd devs should test another regexp code version ?
>> >>
>> >> Why? If we should change libs on every bug encountered, nothing will
>> >> be left.
>> >>
>> >> Anyway, thanks for the report.
>> >>
>> >>    -Otto
>> >>
>> >>>
>> >>> Hope it helps,
>> >>> Who still use sed anyway :)
>> >>>
>> >>> Regards.
>> >>>
>> >>> 2011/6/12 Ingo Schwarze <[hidden email]>
>> >>>
>> >>>> Hi Nils,
>> >>>>
>> >>>> Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
>> >>>>
>> >>>>> I have an issue with sed. Why does
>> >>>>>
>> >>>>>       echo 'ab' | sed -E 's/a|$/x/g'
>> >>>>>
>> >>>>> give 'x' whereas
>> >>>>
>> >>>> I sense a bug here.
>> >>>> Tracing a bit around process(),
>> >>>> it looks like the first application of the s command
>> >>>> yields dst = "x" continue_to_process = "b\n",
>> >>>> and then the second application
>> >>>> appends "\n" to dst (should rather append "b", i think).
>> >>>> Maybe something is wrong here with character/pointer counting,
>> >>>> but i'm somewhat out of time now for tracing.
>> >>>>
>> >>>> This is worth more investigation.
>> >>>>
>> >>>> Yours,
>> >>>>   Ingo
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> () ascii ribbon campaign - against html e-mail
>> >>> /\
>> >
>> > This dif fixes your problem here. Big question is of course: does it
>> > break other cases?
>>
>> It differs from perl like this:
>>
>> $ echo 'l1_1' | perl -pe 's/1|$/X/g'
>> lX_XX
>> $ echo 'l1_1' | sed -E 's/1|$/X/g'
>> lX_X
>>
>> Meaning we don't hit that final '$' if the last match went to eol.
>>
>> /Alexander
>
> Right.
>
> I took a look at freebsd, thay have some patches in this area. But
> applying the changes did not have the desired effect. Have to look deeper.
>
>        -Otto
>
>

HI Otto.

I think that with a minor change (>= instead of > in the last line)
your diff works.

Regards

+                       if (lastempty || match[0].rm_so != match[0].rm_eo ||
+                           (match[0].rm_so == match[0].rm_eo &&
+                           match[0].rm_so >= 0)) {

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Otto Moerbeek
On Wed, Jun 15, 2011 at 11:41:51PM -0300, pablo caballero wrote:

> On Wed, Jun 15, 2011 at 7:17 AM, Otto Moerbeek <[hidden email]> wrote:
> > On Wed, Jun 15, 2011 at 09:50:29AM +0200, Alexander Hall wrote:
> >
> >> On 06/15/11 08:35, Otto Moerbeek wrote:
> >> > On Wed, Jun 15, 2011 at 07:44:20AM +0200, Otto Moerbeek wrote:
> >> >
> >> >> On Tue, Jun 14, 2011 at 11:56:27PM +0200, sven falempin wrote:
> >> >>
> >> >>> Hello,
> >> >>>
> >> >>> Indeed there is a small problem:
> >> >>>
> >> >>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
> >> >>> xbbbbbbbbbbbbbfffff
> >> >>
> >> >> That is expected. $ is only special when it ocurs as the list char of
> >> >> a re.
> >> >>
> >> >>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
> >> >>> x
> >> >>
> >> >> This is likely to be a real bug.
> >> >>
> >> >>>
> >> >>> String modification is done inside the 'case 0:'
> >> >>> substitute(struct s_command *cp) in src/usr.bin/process.c
> >> >>>
> >> >>> But the problem may comme from regexec_e.
> >> >>>
> >> >>> Maybe openbsd devs should test another regexp code version ?
> >> >>
> >> >> Why? If we should change libs on every bug encountered, nothing will
> >> >> be left.
> >> >>
> >> >> Anyway, thanks for the report.
> >> >>
> >> >>    -Otto
> >> >>
> >> >>>
> >> >>> Hope it helps,
> >> >>> Who still use sed anyway :)
> >> >>>
> >> >>> Regards.
> >> >>>
> >> >>> 2011/6/12 Ingo Schwarze <[hidden email]>
> >> >>>
> >> >>>> Hi Nils,
> >> >>>>
> >> >>>> Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
> >> >>>>
> >> >>>>> I have an issue with sed. Why does
> >> >>>>>
> >> >>>>>       echo 'ab' | sed -E 's/a|$/x/g'
> >> >>>>>
> >> >>>>> give 'x' whereas
> >> >>>>
> >> >>>> I sense a bug here.
> >> >>>> Tracing a bit around process(),
> >> >>>> it looks like the first application of the s command
> >> >>>> yields dst = "x" continue_to_process = "b\n",
> >> >>>> and then the second application
> >> >>>> appends "\n" to dst (should rather append "b", i think).
> >> >>>> Maybe something is wrong here with character/pointer counting,
> >> >>>> but i'm somewhat out of time now for tracing.
> >> >>>>
> >> >>>> This is worth more investigation.
> >> >>>>
> >> >>>> Yours,
> >> >>>>   Ingo
> >> >>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> () ascii ribbon campaign - against html e-mail
> >> >>> /\
> >> >
> >> > This dif fixes your problem here. Big question is of course: does it
> >> > break other cases?
> >>
> >> It differs from perl like this:
> >>
> >> $ echo 'l1_1' | perl -pe 's/1|$/X/g'
> >> lX_XX
> >> $ echo 'l1_1' | sed -E 's/1|$/X/g'
> >> lX_X
> >>
> >> Meaning we don't hit that final '$' if the last match went to eol.
> >>
> >> /Alexander
> >
> > Right.
> >
> > I took a look at freebsd, thay have some patches in this area. But
> > applying the changes did not have the desired effect. Have to look deeper.
> >
> >        -Otto
> >
> >
>
> HI Otto.
>
> I think that with a minor change (>= instead of > in the last line)
> your diff works.
>
> Regards
>
> +                       if (lastempty || match[0].rm_so != match[0].rm_eo ||
> +                           (match[0].rm_so == match[0].rm_eo &&
> +                           match[0].rm_so >= 0)) {

From a first look I'm not sure about that. YOu might end up doing a
second append later in the match[0].rm_so == 0 case.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

sven falempin
Hello,

looks like it worked,
but what about this diff,just adding the new when rm_so is 0 (how to run the
regression test ?):


Index: process.c
===================================================================
RCS file: /cvs/src/usr.bin/sed/process.c,v
retrieving revision 1.15
diff -r1.15 process.c
89a90
>
250d250
<
353,362d352
<                       } else {
<                               if (match[0].rm_so == 0)
<                                       cspace(&SS, s, match[0].rm_so + 1,
<                                           APPEND);
<                               else
<                                       cspace(&SS, s + match[0].rm_so, 1,
<                                           APPEND);
<                               s += match[0].rm_so + 1;
<                               slen -= match[0].rm_so + 1;
<                               lastempty = 1;
363a354,362
>                       if (match[0].rm_so == 0)
>                               cspace(&SS, s, match[0].rm_so + 1,
>                                       APPEND);
>                       else
>                               cspace(&SS, s + match[0].rm_so, 1,
>                                       APPEND);
>                       s += match[0].rm_so + 1;
>                       slen -= match[0].rm_so + 1;
>                       lastempty = 1;


2011/6/16 Otto Moerbeek <[hidden email]>

> On Wed, Jun 15, 2011 at 11:41:51PM -0300, pablo caballero wrote:
>
> > On Wed, Jun 15, 2011 at 7:17 AM, Otto Moerbeek <[hidden email]> wrote:
> > > On Wed, Jun 15, 2011 at 09:50:29AM +0200, Alexander Hall wrote:
> > >
> > >> On 06/15/11 08:35, Otto Moerbeek wrote:
> > >> > On Wed, Jun 15, 2011 at 07:44:20AM +0200, Otto Moerbeek wrote:
> > >> >
> > >> >> On Tue, Jun 14, 2011 at 11:56:27PM +0200, sven falempin wrote:
> > >> >>
> > >> >>> Hello,
> > >> >>>
> > >> >>> Indeed there is a small problem:
> > >> >>>
> > >> >>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/[a$]/x/g'
> > >> >>> xbbbbbbbbbbbbbfffff
> > >> >>
> > >> >> That is expected. $ is only special when it ocurs as the list char
> of
> > >> >> a re.
> > >> >>
> > >> >>> # echo 'abbbbbbbbbbbbbfffff' | sed -E 's/a|$/x/g'
> > >> >>> x
> > >> >>
> > >> >> This is likely to be a real bug.
> > >> >>
> > >> >>>
> > >> >>> String modification is done inside the 'case 0:'
> > >> >>> substitute(struct s_command *cp) in src/usr.bin/process.c
> > >> >>>
> > >> >>> But the problem may comme from regexec_e.
> > >> >>>
> > >> >>> Maybe openbsd devs should test another regexp code version ?
> > >> >>
> > >> >> Why? If we should change libs on every bug encountered, nothing
> will
> > >> >> be left.
> > >> >>
> > >> >> Anyway, thanks for the report.
> > >> >>
> > >> >>    -Otto
> > >> >>
> > >> >>>
> > >> >>> Hope it helps,
> > >> >>> Who still use sed anyway :)
> > >> >>>
> > >> >>> Regards.
> > >> >>>
> > >> >>> 2011/6/12 Ingo Schwarze <[hidden email]>
> > >> >>>
> > >> >>>> Hi Nils,
> > >> >>>>
> > >> >>>> Nils Anspach wrote on Sun, Jun 12, 2011 at 12:49:42PM +0200:
> > >> >>>>
> > >> >>>>> I have an issue with sed. Why does
> > >> >>>>>
> > >> >>>>>       echo 'ab' | sed -E 's/a|$/x/g'
> > >> >>>>>
> > >> >>>>> give 'x' whereas
> > >> >>>>
> > >> >>>> I sense a bug here.
> > >> >>>> Tracing a bit around process(),
> > >> >>>> it looks like the first application of the s command
> > >> >>>> yields dst = "x" continue_to_process = "b\n",
> > >> >>>> and then the second application
> > >> >>>> appends "\n" to dst (should rather append "b", i think).
> > >> >>>> Maybe something is wrong here with character/pointer counting,
> > >> >>>> but i'm somewhat out of time now for tracing.
> > >> >>>>
> > >> >>>> This is worth more investigation.
> > >> >>>>
> > >> >>>> Yours,
> > >> >>>>   Ingo
> > >> >>>>
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>> --
> > >> >>> () ascii ribbon campaign - against html e-mail
> > >> >>> /\
> > >> >
> > >> > This dif fixes your problem here. Big question is of course: does it
> > >> > break other cases?
> > >>
> > >> It differs from perl like this:
> > >>
> > >> $ echo 'l1_1' | perl -pe 's/1|$/X/g'
> > >> lX_XX
> > >> $ echo 'l1_1' | sed -E 's/1|$/X/g'
> > >> lX_X
> > >>
> > >> Meaning we don't hit that final '$' if the last match went to eol.
> > >>
> > >> /Alexander
> > >
> > > Right.
> > >
> > > I took a look at freebsd, thay have some patches in this area. But
> > > applying the changes did not have the desired effect. Have to look
> deeper.
> > >
> > >        -Otto
> > >
> > >
> >
> > HI Otto.
> >
> > I think that with a minor change (>= instead of > in the last line)
> > your diff works.
> >
> > Regards
> >
> > +                       if (lastempty || match[0].rm_so != match[0].rm_eo
> ||
> > +                           (match[0].rm_so == match[0].rm_eo &&
> > +                           match[0].rm_so >= 0)) {
>
> From a first look I'm not sure about that. YOu might end up doing a
> second append later in the match[0].rm_so == 0 case.
>
>        -Otto
>
>


--
---------------------------------------------------------------------------------------------------------------------
() ascii ribbon campaign - against html e-mail
/\

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Ingo Schwarze
Hi Sven,

> but what about this diff,

1) Please use diff -up.
2) It doesn't apply to my tree.

If i understand your intention correctly, that is, replacing
the "} else {" by just "}" and doing the content of the else
clause unconditionally, that's plain wrong.  It makes sed
miss one out of two replacements:

 $ echo abcdef | ./obj/sed 's/./x/g'
xbxdxf

> just adding the new when rm_so is 0

Parse error.

> (how to run the regression test ?):

cd /usr/src/usr.bin/sed/
make cleandir
make obj
make depend
make
sudo make install
cd /usr/src/regress/usr.bin/sed/
make cleandir
make obj
make regress

It fails all over the place.

Yours,
  Ingo

> Index: process.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/process.c,v
> retrieving revision 1.15
> diff -r1.15 process.c
> 89a90
> >
> 250d250
> <
> 353,362d352
> <                       } else {
> <                               if (match[0].rm_so == 0)
> <                                       cspace(&SS, s, match[0].rm_so + 1,
> <                                           APPEND);
> <                               else
> <                                       cspace(&SS, s + match[0].rm_so, 1,
> <                                           APPEND);
> <                               s += match[0].rm_so + 1;
> <                               slen -= match[0].rm_so + 1;
> <                               lastempty = 1;
> 363a354,362
> >                       if (match[0].rm_so == 0)
> >                               cspace(&SS, s, match[0].rm_so + 1,
> >                                       APPEND);
> >                       else
> >                               cspace(&SS, s + match[0].rm_so, 1,
> >                                       APPEND);
> >                       s += match[0].rm_so + 1;
> >                       slen -= match[0].rm_so + 1;
> >                       lastempty = 1;

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

Ingo Schwarze
In reply to this post by Otto Moerbeek
Hi Otto,

from careful code inspection, your patch looks correct.

Replacing the ">" by ">=" is wrong, it would breaks this test case:

  echo abc | ./obj/sed -E 's/(b|())/x/g'

However, the code is hard to follow, and i suggest the following
refactoring such that people doing a review can more easily check
the correctness of the code after application of the patch:

 1) X || Y || (!Y && Z) <=> X || Y || Z
    so leave out the middle + line of your patch.

 2) In case of match[0].rm_so == 0,
    the cspace in the if block you touch has no effect
    because copying zero bytes would be pointless.
    Thus, we can pull the cspace before the main if,
    optionally with an if (match[0].rm_so).
    That's also much easier to understand:
    *Always* copy the leading retained string, if any.

 3) In the else clause of "Move past this match",
    the if is completely pointless.
    For vanishing rm_so, both clauses do the same thing.

 4) Pull out the update of s and slen at the end.
    In the if checking whether the current match is empty,
    increment rm_eo exactly at the same time as copying the
    additional character.

The resulting code is much shorter and simpler,
and hence much easier to audit for correctness.

However, i'm a bit scared about touching this code.
Breaking this would be really bad.
I'm currently running a make build with it.

Yours,
  Ingo


> Index: process.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/process.c,v
> retrieving revision 1.15
> diff -u -p -r1.15 process.c
> --- process.c 27 Oct 2009 23:59:43 -0000 1.15
> +++ process.c 15 Jun 2011 06:31:08 -0000
> @@ -336,7 +336,9 @@ substitute(struct s_command *cp)
>   switch (n) {
>   case 0: /* Global */
>   do {
> - if (lastempty || match[0].rm_so != match[0].rm_eo) {
> + if (lastempty || match[0].rm_so != match[0].rm_eo ||
> +    (match[0].rm_so == match[0].rm_eo &&
> +    match[0].rm_so > 0)) {
>   /* Locate start of replaced string. */
>   re_off = match[0].rm_so;
>   /* Copy leading retained string. */


Index: process.c
===================================================================
RCS file: /cvs/src/usr.bin/sed/process.c,v
retrieving revision 1.15
diff -u -p -r1.15 process.c
--- process.c 27 Oct 2009 23:59:43 -0000 1.15
+++ process.c 18 Jun 2011 18:23:00 -0000
@@ -336,31 +336,25 @@ substitute(struct s_command *cp)
  switch (n) {
  case 0: /* Global */
  do {
- if (lastempty || match[0].rm_so != match[0].rm_eo) {
- /* Locate start of replaced string. */
- re_off = match[0].rm_so;
- /* Copy leading retained string. */
- cspace(&SS, s, re_off, APPEND);
- /* Add in regular expression. */
+ /* Copy leading retained string. */
+ if (match[0].rm_so)
+ cspace(&SS, s, match[0].rm_so, APPEND);
+
+ /* Append replacement. */
+ if (lastempty || match[0].rm_so ||
+    match[0].rm_so != match[0].rm_eo)
  regsub(&SS, s, cp->u.s->new);
- }
 
- /* Move past this match. */
- if (match[0].rm_so != match[0].rm_eo) {
- s += match[0].rm_eo;
- slen -= match[0].rm_eo;
- lastempty = 0;
- } else {
- if (match[0].rm_so == 0)
- cspace(&SS, s, match[0].rm_so + 1,
-    APPEND);
- else
- cspace(&SS, s + match[0].rm_so, 1,
-    APPEND);
- s += match[0].rm_so + 1;
- slen -= match[0].rm_so + 1;
+ if (match[0].rm_so == match[0].rm_eo) {
+ cspace(&SS, s + match[0].rm_eo++, 1, APPEND);
  lastempty = 1;
- }
+ } else
+ lastempty = 0;
+
+ /* Move past this match. */
+ s += match[0].rm_eo;
+ slen -= match[0].rm_eo;
+
  } while (slen > 0 && regexec_e(re, s, REG_NOTBOL, 0, slen));
  /* Copy trailing retained string. */
  if (slen > 0)

Reply | Threaded
Open this post in threaded view
|

Re: sed behavior

sven falempin
Hello,

Well, i failed to apply your second patch :s (patch ./process.c ./yourdiff)
doing it manually (twice), it pass test but does not correct the initial
problem.
# echo 'aba' | sed -E 's/a|$/x/g'
xbx

Maybe i'm wrong ?

Nevertheless, my attempt is not that much better, it destroy the leading \n,
i ran the regression test this time.

# echo 'aba' | sed -E 's/a|$/x/g'
xbxx#
#

At least, it explain the purpose of lastempty ?

Regards.

Index: process.c
===================================================================
RCS file: /cvs/src/usr.bin/sed/process.c,v
retrieving revision 1.15
diff -u -p -r1.15 process.c
--- process.c   27 Oct 2009 23:59:43 -0000      1.15
+++ process.c   18 Jun 2011 20:50:28 -0000
@@ -336,31 +336,15 @@ substitute(struct s_command *cp)
        switch (n) {
        case 0:                                 /* Global */
                do {
-                       if (lastempty || match[0].rm_so != match[0].rm_eo) {
-                               /* Locate start of replaced string. */
-                               re_off = match[0].rm_so;
-                               /* Copy leading retained string. */
-                               cspace(&SS, s, re_off, APPEND);
-                               /* Add in regular expression. */
+                       if (match[0].rm_so)
+                               cspace(&SS, s, match[0].rm_so, APPEND);
+                       if (match[0].rm_so == match[0].rm_eo)
+                               cspace(&SS, s, match[0].rm_eo++, APPEND);
+                       if (match[0].rm_so || match[0].rm_so !=
match[0].rm_eo)
                                regsub(&SS, s, cp->u.s->new);
-                       }
-
                        /* Move past this match. */
-                       if (match[0].rm_so != match[0].rm_eo) {
-                               s += match[0].rm_eo;
-                               slen -= match[0].rm_eo;
-                               lastempty = 0;
-                       } else {
-                               if (match[0].rm_so == 0)
-                                       cspace(&SS, s, match[0].rm_so + 1,
-                                           APPEND);
-                               else
-                                       cspace(&SS, s + match[0].rm_so, 1,
-                                           APPEND);
-                               s += match[0].rm_so + 1;
-                               slen -= match[0].rm_so + 1;
-                               lastempty = 1;
-                       }
+                       s += match[0].rm_eo;
+                       slen -= match[0].rm_eo;
                } while (slen > 0 && regexec_e(re, s, REG_NOTBOL, 0, slen));
                /* Copy trailing retained string. */
                if (slen > 0)



2011/6/18 Ingo Schwarze <[hidden email]>

> Hi Otto,
>
> from careful code inspection, your patch looks correct.
>
> Replacing the ">" by ">=" is wrong, it would breaks this test case:
>
>  echo abc | ./obj/sed -E 's/(b|())/x/g'
>
> However, the code is hard to follow, and i suggest the following
> refactoring such that people doing a review can more easily check
> the correctness of the code after application of the patch:
>
>  1) X || Y || (!Y && Z) <=> X || Y || Z
>    so leave out the middle + line of your patch.
>
>  2) In case of match[0].rm_so == 0,
>    the cspace in the if block you touch has no effect
>    because copying zero bytes would be pointless.
>    Thus, we can pull the cspace before the main if,
>    optionally with an if (match[0].rm_so).
>    That's also much easier to understand:
>    *Always* copy the leading retained string, if any.
>
>  3) In the else clause of "Move past this match",
>    the if is completely pointless.
>    For vanishing rm_so, both clauses do the same thing.
>
>  4) Pull out the update of s and slen at the end.
>    In the if checking whether the current match is empty,
>    increment rm_eo exactly at the same time as copying the
>    additional character.
>
> The resulting code is much shorter and simpler,
> and hence much easier to audit for correctness.
>
> However, i'm a bit scared about touching this code.
> Breaking this would be really bad.
> I'm currently running a make build with it.
>
> Yours,
>  Ingo
>
>
> > Index: process.c
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/sed/process.c,v
> > retrieving revision 1.15
> > diff -u -p -r1.15 process.c
> > --- process.c 27 Oct 2009 23:59:43 -0000      1.15
> > +++ process.c 15 Jun 2011 06:31:08 -0000
> > @@ -336,7 +336,9 @@ substitute(struct s_command *cp)
> >       switch (n) {
> >       case 0:                                 /* Global */
> >               do {
> > -                     if (lastempty || match[0].rm_so != match[0].rm_eo)
> {
> > +                     if (lastempty || match[0].rm_so != match[0].rm_eo
> ||
> > +                         (match[0].rm_so == match[0].rm_eo &&
> > +                         match[0].rm_so > 0)) {
> >                               /* Locate start of replaced string. */
> >                               re_off = match[0].rm_so;
> >                               /* Copy leading retained string. */
>
>
> Index: process.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/process.c,v
> retrieving revision 1.15
> diff -u -p -r1.15 process.c
> --- process.c   27 Oct 2009 23:59:43 -0000      1.15
> +++ process.c   18 Jun 2011 18:23:00 -0000
> @@ -336,31 +336,25 @@ substitute(struct s_command *cp)
>         switch (n) {
>        case 0:                                 /* Global */
>                do {
> -                       if (lastempty || match[0].rm_so != match[0].rm_eo)
> {
> -                               /* Locate start of replaced string. */
> -                               re_off = match[0].rm_so;
> -                               /* Copy leading retained string. */
> -                               cspace(&SS, s, re_off, APPEND);
> -                               /* Add in regular expression. */
> +                       /* Copy leading retained string. */
> +                       if (match[0].rm_so)
> +                               cspace(&SS, s, match[0].rm_so, APPEND);
> +
> +                       /* Append replacement. */
> +                       if (lastempty || match[0].rm_so ||
> +                           match[0].rm_so != match[0].rm_eo)
>                                regsub(&SS, s, cp->u.s->new);
> -                       }
>
> -                       /* Move past this match. */
> -                       if (match[0].rm_so != match[0].rm_eo) {
> -                               s += match[0].rm_eo;
> -                               slen -= match[0].rm_eo;
> -                               lastempty = 0;
> -                       } else {
> -                               if (match[0].rm_so == 0)
> -                                       cspace(&SS, s, match[0].rm_so + 1,
> -                                           APPEND);
> -                               else
> -                                       cspace(&SS, s + match[0].rm_so, 1,
> -                                           APPEND);
> -                               s += match[0].rm_so + 1;
> -                               slen -= match[0].rm_so + 1;
> +                       if (match[0].rm_so == match[0].rm_eo) {
> +                               cspace(&SS, s + match[0].rm_eo++, 1,
> APPEND);
>                                lastempty = 1;
> -                       }
> +                       } else
> +                               lastempty = 0;
> +
> +                       /* Move past this match. */
> +                       s += match[0].rm_eo;
> +                       slen -= match[0].rm_eo;
> +
>                } while (slen > 0 && regexec_e(re, s, REG_NOTBOL, 0, slen));
>                /* Copy trailing retained string. */
>                if (slen > 0)
>
>


--
---------------------------------------------------------------------------------------------------------------------
() ascii ribbon campaign - against html e-mail
/\

12