awk FS behaviour change

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

awk FS behaviour change

Stuart Henderson
The Sep 10, 2019 version of awk introduced a change in handling this:

  ifconfig egress | awk '/inet / {FS="[ .]"; print "host-"$4"-"$5"}'

Given a line like

        inet 10.20.30.40 netmask 0xffffff00 broadcast 10.20.30.255

it used to return host-30-40, now it returns host-0xfffffff0-broadcast.
The new behaviour is the same as gawk and old behaviour can be obtained
by doing this instead

  ifconfig egress | awk -F '[ .]' '/inet / {print "host-"$4"-"$5}'

I don't know which is "correct" (the manpage isn't enlightening) but
it was a bit unexpected so I wanted to at least draw attention to it
in case it breaks somebody else's script.

Reply | Threaded
Open this post in threaded view
|

Re: awk FS behaviour change

Todd C. Miller-3
On Fri, 26 Jun 2020 21:41:57 +0100, Stuart Henderson wrote:

> The Sep 10, 2019 version of awk introduced a change in handling this:
>
>   ifconfig egress | awk '/inet / {FS="[ .]"; print "host-"$4"-"$5"}'
>
> Given a line like
>
>         inet 10.20.30.40 netmask 0xffffff00 broadcast 10.20.30.255
>
> it used to return host-30-40, now it returns host-0xfffffff0-broadcast.
> The new behaviour is the same as gawk and old behaviour can be obtained
> by doing this instead
>
>   ifconfig egress | awk -F '[ .]' '/inet / {print "host-"$4"-"$5}'
>
> I don't know which is "correct" (the manpage isn't enlightening) but
> it was a bit unexpected so I wanted to at least draw attention to it
> in case it breaks somebody else's script.

The current behavior is correct.  The old behavior was a bug because
field splitting is supported use the value of FS at the time the
record was read.  Setting FS after the line has been read is too
late.

You need to either set FS via the -F flag or inside a BEGIN block.

 - todd

Reply | Threaded
Open this post in threaded view
|

Re: awk FS behaviour change

Todd C. Miller-3
In reply to this post by Stuart Henderson
On Fri, 26 Jun 2020 21:41:57 +0100, Stuart Henderson wrote:

> I don't know which is "correct" (the manpage isn't enlightening) but
> it was a bit unexpected so I wanted to at least draw attention to it
> in case it breaks somebody else's script.

The awk manual leaves a lot of things unspecified (buy the book ;-).
Does this addition help clear things up?

 - todd

Index: awk.1
===================================================================
RCS file: /cvs/src/usr.bin/awk/awk.1,v
retrieving revision 1.53
diff -u -p -u -r1.53 awk.1
--- awk.1 17 Jun 2020 15:34:11 -0000 1.53
+++ awk.1 26 Jun 2020 21:39:19 -0000
@@ -140,6 +140,16 @@ refers to the entire line.
 If
 .Va FS
 is null, the input line is split into one field per character.
+Lines are split into fields using the value of
+.Va FS
+at the time the line is read.
+Because of this,
+.Va FS
+is usually set via the
+.Fl F
+option or inside of a
+.Ic BEGIN
+block.
 .Pp
 Normally, any number of blanks separate fields.
 In order to set the field separator to a single blank, use the

Reply | Threaded
Open this post in threaded view
|

Re: awk FS behaviour change

Klemens Nanni-2
On Fri, Jun 26, 2020 at 03:41:21PM -0600, Todd C. Miller wrote:
> The awk manual leaves a lot of things unspecified (buy the book ;-).
> Does this addition help clear things up?
Yes.
 

> Index: awk.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/awk/awk.1,v
> retrieving revision 1.53
> diff -u -p -u -r1.53 awk.1
> --- awk.1 17 Jun 2020 15:34:11 -0000 1.53
> +++ awk.1 26 Jun 2020 21:39:19 -0000
> @@ -140,6 +140,16 @@ refers to the entire line.
>  If
>  .Va FS
>  is null, the input line is split into one field per character.
> +Lines are split into fields using the value of
> +.Va FS
> +at the time the line is read.
> +Because of this,
> +.Va FS
> +is usually set via the
> +.Fl F
> +option or inside of a
> +.Ic BEGIN
> +block.
>  .Pp
>  Normally, any number of blanks separate fields.
>  In order to set the field separator to a single blank, use the

Given that this amends the following paragraph, your first sentence
seems repetitive:

        An input line is normally made up of fields separated by whitespace, or
        by the regular expression FS.  The fields are denoted $1, $2, ..., while
        $0 refers to the entire line.  If FS is null, the input line is split
        into one field per character.

How about adding something like "Therefore, FS should be set with -F or
in a BEGIN block before input is read." as second sentence in this
paragraph?

Reply | Threaded
Open this post in threaded view
|

Re: awk FS behaviour change

Todd C. Miller-3
On Fri, 26 Jun 2020 23:56:23 +0200, Klemens Nanni wrote:

> How about adding something like "Therefore, FS should be set with -F or
> in a BEGIN block before input is read." as second sentence in this
> paragraph?

That whole section is missing important details.  I've tried to add
the missing info without being too repetitive.

 - todd

Index: usr.bin/awk/awk.1
===================================================================
RCS file: /cvs/src/usr.bin/awk/awk.1,v
retrieving revision 1.54
diff -u -p -u -r1.54 awk.1
--- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 -0000 1.54
+++ usr.bin/awk/awk.1 27 Jun 2020 03:25:48 -0000
@@ -129,27 +129,25 @@ and newlines are used as field separator
 .Va FS ) .
 This is convenient when working with multi-line records.
 .Pp
-An input line is normally made up of fields separated by whitespace,
-or by the regular expression
-.Va FS .
+An input line is normally made up of fields split based on the value
+of the field separator
+.Va FS
+at the time the line is read.
 The fields are denoted
 .Va $1 , $2 , ... ,
 while
 .Va $0
 refers to the entire line.
-If
 .Va FS
-is null, the input line is split into one field per character.
-Lines are split into fields using the value of
+may be set to either a single character or a regular expression.
+As as special case, if
 .Va FS
-at the time the line is read.
-Because of this,
+is a single space
+.Pq the default ,
+fields will be split by one or more whitespace characters.
+If
 .Va FS
-is usually set via the
-.Fl F
-option or inside of a
-.Ic BEGIN
-block.
+is null, the input line is split into one field per character.
 .Pp
 Normally, any number of blanks separate fields.
 In order to set the field separator to a single blank, use the
@@ -171,6 +169,11 @@ as the field separator, use the
 .Fl F
 option with a value of
 .Sq [t] .
+The field separator is usually set via the
+.Fl F
+option or from inside of a
+.Ic BEGIN
+block so that it takes effect before the input is read.
 .Pp
 A pattern-action statement has the form:
 .Pp
@@ -407,9 +410,9 @@ The name of the current input file.
 .It Va FNR
 Ordinal number of the current record in the current file.
 .It Va FS
-Regular expression used to separate fields; also settable
-by option
-.Fl F Ar fs .
+Regular expression used to separate fields (default whitespace);
+also settable by option
+.Fl F Ar fs
 .It Va NF
 Number of fields in the current record.
 .Va $NF

Reply | Threaded
Open this post in threaded view
|

Re: awk FS behaviour change

Jason McIntyre-2
On Fri, Jun 26, 2020 at 09:28:00PM -0600, Todd C. Miller wrote:

> On Fri, 26 Jun 2020 23:56:23 +0200, Klemens Nanni wrote:
>
> > How about adding something like "Therefore, FS should be set with -F or
> > in a BEGIN block before input is read." as second sentence in this
> > paragraph?
>
> That whole section is missing important details.  I've tried to add
> the missing info without being too repetitive.
>
>  - todd
>
> Index: usr.bin/awk/awk.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/awk/awk.1,v
> retrieving revision 1.54
> diff -u -p -u -r1.54 awk.1
> --- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 -0000 1.54
> +++ usr.bin/awk/awk.1 27 Jun 2020 03:25:48 -0000
> @@ -129,27 +129,25 @@ and newlines are used as field separator
>  .Va FS ) .
>  This is convenient when working with multi-line records.
>  .Pp
> -An input line is normally made up of fields separated by whitespace,
> -or by the regular expression
> -.Va FS .
> +An input line is normally made up of fields split based on the value
> +of the field separator
> +.Va FS
> +at the time the line is read.

i'm not sure it reads better when we switch the emphasis from whitespace
to FS. i think it's better that people see how it normally works, then
the gories about FS. so i'd have kept the first part of the sentence,
but maybe reworked the FS bit.

>  The fields are denoted
>  .Va $1 , $2 , ... ,
>  while
>  .Va $0
>  refers to the entire line.
> -If
>  .Va FS
> -is null, the input line is split into one field per character.
> -Lines are split into fields using the value of
> +may be set to either a single character or a regular expression.
> +As as special case, if
>  .Va FS
> -at the time the line is read.
> -Because of this,
> +is a single space
> +.Pq the default ,
> +fields will be split by one or more whitespace characters.
> +If
>  .Va FS
> -is usually set via the
> -.Fl F
> -option or inside of a
> -.Ic BEGIN
> -block.
> +is null, the input line is split into one field per character.
>  .Pp
>  Normally, any number of blanks separate fields.
>  In order to set the field separator to a single blank, use the
> @@ -171,6 +169,11 @@ as the field separator, use the
>  .Fl F
>  option with a value of
>  .Sq [t] .
> +The field separator is usually set via the
> +.Fl F
> +option or from inside of a

that sounds odd, but it may be a US/UK thing: i would say either "from
inside a block" or "from the inside of a block".

jmc

> +.Ic BEGIN
> +block so that it takes effect before the input is read.
>  .Pp
>  A pattern-action statement has the form:
>  .Pp
> @@ -407,9 +410,9 @@ The name of the current input file.
>  .It Va FNR
>  Ordinal number of the current record in the current file.
>  .It Va FS
> -Regular expression used to separate fields; also settable
> -by option
> -.Fl F Ar fs .
> +Regular expression used to separate fields (default whitespace);
> +also settable by option
> +.Fl F Ar fs
>  .It Va NF
>  Number of fields in the current record.
>  .Va $NF
>

Reply | Threaded
Open this post in threaded view
|

Re: awk FS behaviour change

patrick keshishian-2
On Sat, Jun 27, 2020 at 06:50:39AM +0100, Jason McIntyre wrote:

> On Fri, Jun 26, 2020 at 09:28:00PM -0600, Todd C. Miller wrote:
> > On Fri, 26 Jun 2020 23:56:23 +0200, Klemens Nanni wrote:
> >
> > > How about adding something like "Therefore, FS should be set with -F or
> > > in a BEGIN block before input is read." as second sentence in this
> > > paragraph?
> >
> > That whole section is missing important details.  I've tried to add
> > the missing info without being too repetitive.
> >
> >  - todd
> >
> > Index: usr.bin/awk/awk.1
> > ===================================================================
> > RCS file: /cvs/src/usr.bin/awk/awk.1,v
> > retrieving revision 1.54
> > diff -u -p -u -r1.54 awk.1
> > --- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 -0000 1.54
> > +++ usr.bin/awk/awk.1 27 Jun 2020 03:25:48 -0000
> > @@ -129,27 +129,25 @@ and newlines are used as field separator
> >  .Va FS ) .
> >  This is convenient when working with multi-line records.
> >  .Pp
> > -An input line is normally made up of fields separated by whitespace,
> > -or by the regular expression
> > -.Va FS .
> > +An input line is normally made up of fields split based on the value
> > +of the field separator
> > +.Va FS
> > +at the time the line is read.
>
> i'm not sure it reads better when we switch the emphasis from whitespace
> to FS. i think it's better that people see how it normally works, then
> the gories about FS. so i'd have kept the first part of the sentence,
> but maybe reworked the FS bit.
>
> >  The fields are denoted
> >  .Va $1 , $2 , ... ,
> >  while
> >  .Va $0
> >  refers to the entire line.
> > -If
> >  .Va FS
> > -is null, the input line is split into one field per character.
> > -Lines are split into fields using the value of
> > +may be set to either a single character or a regular expression.
> > +As as special case, if
> >  .Va FS
> > -at the time the line is read.
> > -Because of this,
> > +is a single space
> > +.Pq the default ,
> > +fields will be split by one or more whitespace characters.
> > +If
> >  .Va FS
> > -is usually set via the
> > -.Fl F
> > -option or inside of a
> > -.Ic BEGIN
> > -block.
> > +is null, the input line is split into one field per character.
> >  .Pp
> >  Normally, any number of blanks separate fields.
> >  In order to set the field separator to a single blank, use the
> > @@ -171,6 +169,11 @@ as the field separator, use the
> >  .Fl F
> >  option with a value of
> >  .Sq [t] .
> > +The field separator is usually set via the
> > +.Fl F
> > +option or from inside of a
>
> that sounds odd, but it may be a US/UK thing: i would say either "from
> inside a block" or "from the inside of a block".

Maybe "... from inside of the" rather than "... from inside of a"

--patrick

>
> jmc
>
> > +.Ic BEGIN
> > +block so that it takes effect before the input is read.
> >  .Pp
> >  A pattern-action statement has the form:
> >  .Pp
> > @@ -407,9 +410,9 @@ The name of the current input file.
> >  .It Va FNR
> >  Ordinal number of the current record in the current file.
> >  .It Va FS
> > -Regular expression used to separate fields; also settable
> > -by option
> > -.Fl F Ar fs .
> > +Regular expression used to separate fields (default whitespace);
> > +also settable by option
> > +.Fl F Ar fs
> >  .It Va NF
> >  Number of fields in the current record.
> >  .Va $NF
> >
>

Reply | Threaded
Open this post in threaded view
|

Re: awk FS behaviour change

Todd C. Miller-3
In reply to this post by Jason McIntyre-2
On Sat, 27 Jun 2020 06:50:39 +0100, Jason McIntyre wrote:

> i'm not sure it reads better when we switch the emphasis from whitespace
> to FS. i think it's better that people see how it normally works, then
> the gories about FS. so i'd have kept the first part of the sentence,
> but maybe reworked the FS bit.

I wasn't sure that was an improvement either.  Does this seem better?

 - todd

Index: usr.bin/awk/awk.1
===================================================================
RCS file: /cvs/src/usr.bin/awk/awk.1,v
retrieving revision 1.54
diff -u -p -u -r1.54 awk.1
--- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 -0000 1.54
+++ usr.bin/awk/awk.1 27 Jun 2020 12:29:21 -0000
@@ -130,26 +130,24 @@ and newlines are used as field separator
 This is convenient when working with multi-line records.
 .Pp
 An input line is normally made up of fields separated by whitespace,
-or by the regular expression
-.Va FS .
+or by the value of the field separator
+.Va FS
+at the time the line is read.
 The fields are denoted
 .Va $1 , $2 , ... ,
 while
 .Va $0
 refers to the entire line.
-If
 .Va FS
-is null, the input line is split into one field per character.
-Lines are split into fields using the value of
+may be set to either a single character or a regular expression.
+As as special case, if
 .Va FS
-at the time the line is read.
-Because of this,
+is a single space
+.Pq the default ,
+fields will be split by one or more whitespace characters.
+If
 .Va FS
-is usually set via the
-.Fl F
-option or inside of a
-.Ic BEGIN
-block.
+is null, the input line is split into one field per character.
 .Pp
 Normally, any number of blanks separate fields.
 In order to set the field separator to a single blank, use the
@@ -171,6 +169,11 @@ as the field separator, use the
 .Fl F
 option with a value of
 .Sq [t] .
+The field separator is usually set via the
+.Fl F
+option or from inside a
+.Ic BEGIN
+block so that it takes effect before the input is read.
 .Pp
 A pattern-action statement has the form:
 .Pp
@@ -407,9 +410,9 @@ The name of the current input file.
 .It Va FNR
 Ordinal number of the current record in the current file.
 .It Va FS
-Regular expression used to separate fields; also settable
-by option
-.Fl F Ar fs .
+Regular expression used to separate fields (default whitespace);
+also settable by option
+.Fl F Ar fs
 .It Va NF
 Number of fields in the current record.
 .Va $NF

Reply | Threaded
Open this post in threaded view
|

Re: awk FS behaviour change

Klemens Nanni-2
On Sat, Jun 27, 2020 at 06:32:11AM -0600, Todd C. Miller wrote:
> I wasn't sure that was an improvement either.  Does this seem better?
To me it does, thanks.

OK kn

> Index: usr.bin/awk/awk.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/awk/awk.1,v
> retrieving revision 1.54
> diff -u -p -u -r1.54 awk.1
> --- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 -0000 1.54
> +++ usr.bin/awk/awk.1 27 Jun 2020 12:29:21 -0000
> @@ -130,26 +130,24 @@ and newlines are used as field separator
>  This is convenient when working with multi-line records.
>  .Pp
>  An input line is normally made up of fields separated by whitespace,
> -or by the regular expression
> -.Va FS .
> +or by the value of the field separator
> +.Va FS
> +at the time the line is read.
>  The fields are denoted
>  .Va $1 , $2 , ... ,
>  while
>  .Va $0
>  refers to the entire line.
> -If
>  .Va FS
> -is null, the input line is split into one field per character.
> -Lines are split into fields using the value of
> +may be set to either a single character or a regular expression.
> +As as special case, if
>  .Va FS
> -at the time the line is read.
> -Because of this,
> +is a single space
> +.Pq the default ,
.Pq is probably not needed here, at the end you're doing also just using
"(default whitespace)".

> +fields will be split by one or more whitespace characters.
> +If
>  .Va FS
> -is usually set via the
> -.Fl F
> -option or inside of a
> -.Ic BEGIN
> -block.
> +is null, the input line is split into one field per character.
>  .Pp
>  Normally, any number of blanks separate fields.
>  In order to set the field separator to a single blank, use the
> @@ -171,6 +169,11 @@ as the field separator, use the
>  .Fl F
>  option with a value of
>  .Sq [t] .
> +The field separator is usually set via the
> +.Fl F
> +option or from inside a
> +.Ic BEGIN
> +block so that it takes effect before the input is read.
>  .Pp
>  A pattern-action statement has the form:
>  .Pp
> @@ -407,9 +410,9 @@ The name of the current input file.
>  .It Va FNR
>  Ordinal number of the current record in the current file.
>  .It Va FS
> -Regular expression used to separate fields; also settable
> -by option
> -.Fl F Ar fs .
> +Regular expression used to separate fields (default whitespace);
> +also settable by option
> +.Fl F Ar fs
Missing dot here (with trailing space after "fs").

>  .It Va NF
>  Number of fields in the current record.
>  .Va $NF
>

Reply | Threaded
Open this post in threaded view
|

Re: awk FS behaviour change

Jason McIntyre-2
In reply to this post by Todd C. Miller-3
On Sat, Jun 27, 2020 at 06:32:11AM -0600, Todd C. Miller wrote:

> On Sat, 27 Jun 2020 06:50:39 +0100, Jason McIntyre wrote:
>
> > i'm not sure it reads better when we switch the emphasis from whitespace
> > to FS. i think it's better that people see how it normally works, then
> > the gories about FS. so i'd have kept the first part of the sentence,
> > but maybe reworked the FS bit.
>
> I wasn't sure that was an improvement either.  Does this seem better?
>
>  - todd
>

yes, i think this is better. ok by me.
jmc

> Index: usr.bin/awk/awk.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/awk/awk.1,v
> retrieving revision 1.54
> diff -u -p -u -r1.54 awk.1
> --- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 -0000 1.54
> +++ usr.bin/awk/awk.1 27 Jun 2020 12:29:21 -0000
> @@ -130,26 +130,24 @@ and newlines are used as field separator
>  This is convenient when working with multi-line records.
>  .Pp
>  An input line is normally made up of fields separated by whitespace,
> -or by the regular expression
> -.Va FS .
> +or by the value of the field separator
> +.Va FS
> +at the time the line is read.
>  The fields are denoted
>  .Va $1 , $2 , ... ,
>  while
>  .Va $0
>  refers to the entire line.
> -If
>  .Va FS
> -is null, the input line is split into one field per character.
> -Lines are split into fields using the value of
> +may be set to either a single character or a regular expression.
> +As as special case, if
>  .Va FS
> -at the time the line is read.
> -Because of this,
> +is a single space
> +.Pq the default ,
> +fields will be split by one or more whitespace characters.
> +If
>  .Va FS
> -is usually set via the
> -.Fl F
> -option or inside of a
> -.Ic BEGIN
> -block.
> +is null, the input line is split into one field per character.
>  .Pp
>  Normally, any number of blanks separate fields.
>  In order to set the field separator to a single blank, use the
> @@ -171,6 +169,11 @@ as the field separator, use the
>  .Fl F
>  option with a value of
>  .Sq [t] .
> +The field separator is usually set via the
> +.Fl F
> +option or from inside a
> +.Ic BEGIN
> +block so that it takes effect before the input is read.
>  .Pp
>  A pattern-action statement has the form:
>  .Pp
> @@ -407,9 +410,9 @@ The name of the current input file.
>  .It Va FNR
>  Ordinal number of the current record in the current file.
>  .It Va FS
> -Regular expression used to separate fields; also settable
> -by option
> -.Fl F Ar fs .
> +Regular expression used to separate fields (default whitespace);
> +also settable by option
> +.Fl F Ar fs
>  .It Va NF
>  Number of fields in the current record.
>  .Va $NF
>