Panic with pf 'once' rule

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Panic with pf 'once' rule

Aaron Bieber-2
Hi,

Adding a rule similar to the below causes a panic on -current (OpenBSD
6.5-current (GENERIC) #95: Thu Jul  4 21:22:25 MDT 2019). This also panics 6.3
and 6.5 (I didn't test 6.4):

  pass in quick on egress proto tcp from any to port 8888 once rdr-to \
        127.0.0.1 port 3333

Once the rule is in place, fire up:

  nc -l 127.0.0.1 3333

Connect a few times from a remote machine:
 
  nc <ip> 8888

Eventually it will panic with (sometimes it happens right away, other times I
have to restart nc a few times):

  ddb> trace
  pf_rm_rule(ffffffff81d900a8,ffff8000003bbfe8) at pf_rm_rule+0xa9
  pf_purge_rule(ffff8000003bbfe8) at pf_purge_rule+0x26
  pf_purge(ffffffff81dc1088) at pf_purge+0x55
  taskq_thread(ffff800000022040) at taskq_thread+0x3d
  end trace frame: 0x0, count: -4
  ddb>
  ddb> ps
     PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
   69502   12189      1      0  3    0x100083  ttyin         ksh
   53972  340673      1      0  3    0x100098  poll          cron
   81827  279222      1    110  3    0x100090  poll          sndiod
   54852   68160      1     99  3    0x100090  poll          sndiod
   79474   94554   3215     95  3    0x100092  kqread        smtpd
   90212  164878   3215    103  3    0x100092  kqread        smtpd
   43199  482512   3215     95  3    0x100092  kqread        smtpd
   38765  100663   3215     95  3    0x100092  kqread        smtpd
   33241  424770   3215     95  3    0x100092  kqread        smtpd
    5338  193750   3215     95  3    0x100092  kqread        smtpd
    3215  481909      1      0  3    0x100080  kqread        smtpd
   57742  143403      1      0  3        0x80  select        sshd
   31904  460143      1      0  3    0x100080  poll          ntpd
   65592  182120  47006     83  3    0x100092  poll          ntpd
   47006  103509      1     83  3    0x100092  poll          ntpd
   60875  292765  99617     74  3    0x100092  bpf           pflogd
   99617  524148      1      0  3        0x80  netio         pflogd
    4242  324170  49064     73  3    0x100090  kqread        syslogd
   49064  413359      1      0  3    0x100082  netio         syslogd
   20955  102995  68995    115  3    0x100092  kqread        slaacd
   99883  518930  68995    115  3    0x100092  kqread        slaacd
   68995  175540      1      0  3    0x100080  kqread        slaacd
    5278  238159      0      0  3     0x14200  pgzero        zerothread
    2253  479921      0      0  3     0x14200  aiodoned      aiodoned
   98149  310276      0      0  3     0x14200  syncer        update
   78055  259911      0      0  3     0x14200  cleaner       cleaner
   68827  324781      0      0  3     0x14200  reaper        reaper
   93269   98863      0      0  3     0x14200  pgdaemon      pagedaemon
   75284  447451      0      0  3     0x14200  bored         crynlk
   34868  513191      0      0  3     0x14200  bored         crypto
  *18776  255193      0      0  7     0x14200                softnet
   64918  469356      0      0  3     0x14200  bored         systqmp
     902   49537      0      0  3     0x14200  bored         systq
   17250  200730      0      0  3  0x40014200  bored         softclock
    2990  510299      0      0  3  0x40014200                idle0
     947  215447      0      0  3     0x14200  bored         smr
       1  180680      0      0  3        0x82  wait          init
       0       0     -1      0  3     0x10200  scheduler     swapper
  ddb>

dmesg (from a VM in vmm - I have also reproduced this on physical hw):
OpenBSD 6.5-current (GENERIC) #95: Thu Jul  4 21:22:25 MDT 2019
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 4278181888 (4079MB)
avail mem = 4138524672 (3946MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf3f10 (12 entries)
bios0: vendor SeaBIOS version "1.11.0p2-OpenBSD-vmm" date 01/01/2011
bios0: OpenBSD VMM
acpi at bios0 not configured
cpu0 at mainbus0: (uniprocessor)
cpu0: AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx, 37466.79 MHz, 17-11-00
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA
cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
pvbus0 at mainbus0: OpenBSD
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
viornd0 at virtio0
virtio0: irq 3
virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
vio0 at virtio1: address fe:e1:bb:d1:eb:4d
virtio1: irq 5
virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
vioblk0 at virtio2
scsibus1 at vioblk0: 2 targets
sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
sd0: 40960MB, 512 bytes/sector, 83886080 sectors
virtio2: irq 6
virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00
vmmci0 at virtio3
virtio3: irq 7
isa0 at mainbus0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
com0: console
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (66c460169c410440.a) swap on sd0b dump on sd0b
WARNING: / was not properly unmounted

--
PGP: 0x1F81112D62A9ADCE / 3586 3350 BFEA C101 DB1A  4AF0 1F81 112D 62A9 ADCE

Reply | Threaded
Open this post in threaded view
|

Re: Panic with pf 'once' rule

Mike Belopuhov-5

Aaron Bieber writes:

> Hi,
>
> Adding a rule similar to the below causes a panic on -current (OpenBSD
> 6.5-current (GENERIC) #95: Thu Jul  4 21:22:25 MDT 2019). This also panics 6.3
> and 6.5 (I didn't test 6.4):
>
>   pass in quick on egress proto tcp from any to port 8888 once rdr-to \
> 127.0.0.1 port 3333
>
> Once the rule is in place, fire up:
>
>   nc -l 127.0.0.1 3333
>
> Connect a few times from a remote machine:
>  
>   nc <ip> 8888
>
> Eventually it will panic with (sometimes it happens right away, other times I
> have to restart nc a few times):
>

This is because it's meant to be used inside of an anchor
(it removes the rule once it's matched).

The most sensible way to use it is to put it into the anchor
inside a recursive anchor (e.g. 'relayd/*').

It's possible that the check protecting the system from
the misuse like you've described here got lost during
refactoring or it never existed in the first place :-(

Cheers,
Mike

>   ddb> trace
>   pf_rm_rule(ffffffff81d900a8,ffff8000003bbfe8) at pf_rm_rule+0xa9
>   pf_purge_rule(ffff8000003bbfe8) at pf_purge_rule+0x26
>   pf_purge(ffffffff81dc1088) at pf_purge+0x55
>   taskq_thread(ffff800000022040) at taskq_thread+0x3d
>   end trace frame: 0x0, count: -4
>   ddb>
>   ddb> ps
>      PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
>    69502   12189      1      0  3    0x100083  ttyin         ksh
>    53972  340673      1      0  3    0x100098  poll          cron
>    81827  279222      1    110  3    0x100090  poll          sndiod
>    54852   68160      1     99  3    0x100090  poll          sndiod
>    79474   94554   3215     95  3    0x100092  kqread        smtpd
>    90212  164878   3215    103  3    0x100092  kqread        smtpd
>    43199  482512   3215     95  3    0x100092  kqread        smtpd
>    38765  100663   3215     95  3    0x100092  kqread        smtpd
>    33241  424770   3215     95  3    0x100092  kqread        smtpd
>     5338  193750   3215     95  3    0x100092  kqread        smtpd
>     3215  481909      1      0  3    0x100080  kqread        smtpd
>    57742  143403      1      0  3        0x80  select        sshd
>    31904  460143      1      0  3    0x100080  poll          ntpd
>    65592  182120  47006     83  3    0x100092  poll          ntpd
>    47006  103509      1     83  3    0x100092  poll          ntpd
>    60875  292765  99617     74  3    0x100092  bpf           pflogd
>    99617  524148      1      0  3        0x80  netio         pflogd
>     4242  324170  49064     73  3    0x100090  kqread        syslogd
>    49064  413359      1      0  3    0x100082  netio         syslogd
>    20955  102995  68995    115  3    0x100092  kqread        slaacd
>    99883  518930  68995    115  3    0x100092  kqread        slaacd
>    68995  175540      1      0  3    0x100080  kqread        slaacd
>     5278  238159      0      0  3     0x14200  pgzero        zerothread
>     2253  479921      0      0  3     0x14200  aiodoned      aiodoned
>    98149  310276      0      0  3     0x14200  syncer        update
>    78055  259911      0      0  3     0x14200  cleaner       cleaner
>    68827  324781      0      0  3     0x14200  reaper        reaper
>    93269   98863      0      0  3     0x14200  pgdaemon      pagedaemon
>    75284  447451      0      0  3     0x14200  bored         crynlk
>    34868  513191      0      0  3     0x14200  bored         crypto
>   *18776  255193      0      0  7     0x14200                softnet
>    64918  469356      0      0  3     0x14200  bored         systqmp
>      902   49537      0      0  3     0x14200  bored         systq
>    17250  200730      0      0  3  0x40014200  bored         softclock
>     2990  510299      0      0  3  0x40014200                idle0
>      947  215447      0      0  3     0x14200  bored         smr
>        1  180680      0      0  3        0x82  wait          init
>        0       0     -1      0  3     0x10200  scheduler     swapper
>   ddb>
>
> dmesg (from a VM in vmm - I have also reproduced this on physical hw):
> OpenBSD 6.5-current (GENERIC) #95: Thu Jul  4 21:22:25 MDT 2019
>     [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
> real mem = 4278181888 (4079MB)
> avail mem = 4138524672 (3946MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf3f10 (12 entries)
> bios0: vendor SeaBIOS version "1.11.0p2-OpenBSD-vmm" date 01/01/2011
> bios0: OpenBSD VMM
> acpi at bios0 not configured
> cpu0 at mainbus0: (uniprocessor)
> cpu0: AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx, 37466.79 MHz, 17-11-00
> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA
> cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
> pvbus0 at mainbus0: OpenBSD
> pvclock0 at pvbus0
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
> viornd0 at virtio0
> virtio0: irq 3
> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
> vio0 at virtio1: address fe:e1:bb:d1:eb:4d
> virtio1: irq 5
> virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
> vioblk0 at virtio2
> scsibus1 at vioblk0: 2 targets
> sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
> sd0: 40960MB, 512 bytes/sector, 83886080 sectors
> virtio2: irq 6
> virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00
> vmmci0 at virtio3
> virtio3: irq 7
> isa0 at mainbus0
> isadma0 at isa0
> com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
> com0: console
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> root on sd0a (66c460169c410440.a) swap on sd0b dump on sd0b
> WARNING: / was not properly unmounted

Reply | Threaded
Open this post in threaded view
|

Re: Panic with pf 'once' rule

Aaron Bieber-2
I get the panic when the rule is in an anchor as well.

PGP: 0x1F81112D62A9ADCE / 3586 3350 BFEA C101 DB1A 4AF0 1F81 112D 62A9 ADCE

>> On Jul 7, 2019, at 4:39 PM, Mike Belopuhov <[hidden email]> wrote:
> 
> Aaron Bieber writes:
>
>> Hi,
>>
>> Adding a rule similar to the below causes a panic on -current (OpenBSD
>> 6.5-current (GENERIC) #95: Thu Jul  4 21:22:25 MDT 2019). This also panics 6.3
>> and 6.5 (I didn't test 6.4):
>>
>>  pass in quick on egress proto tcp from any to port 8888 once rdr-to \
>>    127.0.0.1 port 3333
>>
>> Once the rule is in place, fire up:
>>
>>  nc -l 127.0.0.1 3333
>>
>> Connect a few times from a remote machine:
>>
>>  nc <ip> 8888
>>
>> Eventually it will panic with (sometimes it happens right away, other times I
>> have to restart nc a few times):
>
> This is because it's meant to be used inside of an anchor
> (it removes the rule once it's matched).
>
> The most sensible way to use it is to put it into the anchor
> inside a recursive anchor (e.g. 'relayd/*').
>
> It's possible that the check protecting the system from
> the misuse like you've described here got lost during
> refactoring or it never existed in the first place :-(
>
> Cheers,
> Mike
>
>>  ddb> trace
>>  pf_rm_rule(ffffffff81d900a8,ffff8000003bbfe8) at pf_rm_rule+0xa9
>>  pf_purge_rule(ffff8000003bbfe8) at pf_purge_rule+0x26
>>  pf_purge(ffffffff81dc1088) at pf_purge+0x55
>>  taskq_thread(ffff800000022040) at taskq_thread+0x3d
>>  end trace frame: 0x0, count: -4
>>  ddb>
>>  ddb> ps
>>     PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
>>   69502   12189      1      0  3    0x100083  ttyin         ksh
>>   53972  340673      1      0  3    0x100098  poll          cron
>>   81827  279222      1    110  3    0x100090  poll          sndiod
>>   54852   68160      1     99  3    0x100090  poll          sndiod
>>   79474   94554   3215     95  3    0x100092  kqread        smtpd
>>   90212  164878   3215    103  3    0x100092  kqread        smtpd
>>   43199  482512   3215     95  3    0x100092  kqread        smtpd
>>   38765  100663   3215     95  3    0x100092  kqread        smtpd
>>   33241  424770   3215     95  3    0x100092  kqread        smtpd
>>    5338  193750   3215     95  3    0x100092  kqread        smtpd
>>    3215  481909      1      0  3    0x100080  kqread        smtpd
>>   57742  143403      1      0  3        0x80  select        sshd
>>   31904  460143      1      0  3    0x100080  poll          ntpd
>>   65592  182120  47006     83  3    0x100092  poll          ntpd
>>   47006  103509      1     83  3    0x100092  poll          ntpd
>>   60875  292765  99617     74  3    0x100092  bpf           pflogd
>>   99617  524148      1      0  3        0x80  netio         pflogd
>>    4242  324170  49064     73  3    0x100090  kqread        syslogd
>>   49064  413359      1      0  3    0x100082  netio         syslogd
>>   20955  102995  68995    115  3    0x100092  kqread        slaacd
>>   99883  518930  68995    115  3    0x100092  kqread        slaacd
>>   68995  175540      1      0  3    0x100080  kqread        slaacd
>>    5278  238159      0      0  3     0x14200  pgzero        zerothread
>>    2253  479921      0      0  3     0x14200  aiodoned      aiodoned
>>   98149  310276      0      0  3     0x14200  syncer        update
>>   78055  259911      0      0  3     0x14200  cleaner       cleaner
>>   68827  324781      0      0  3     0x14200  reaper        reaper
>>   93269   98863      0      0  3     0x14200  pgdaemon      pagedaemon
>>   75284  447451      0      0  3     0x14200  bored         crynlk
>>   34868  513191      0      0  3     0x14200  bored         crypto
>>  *18776  255193      0      0  7     0x14200                softnet
>>   64918  469356      0      0  3     0x14200  bored         systqmp
>>     902   49537      0      0  3     0x14200  bored         systq
>>   17250  200730      0      0  3  0x40014200  bored         softclock
>>    2990  510299      0      0  3  0x40014200                idle0
>>     947  215447      0      0  3     0x14200  bored         smr
>>       1  180680      0      0  3        0x82  wait          init
>>       0       0     -1      0  3     0x10200  scheduler     swapper
>>  ddb>
>>
>> dmesg (from a VM in vmm - I have also reproduced this on physical hw):
>> OpenBSD 6.5-current (GENERIC) #95: Thu Jul  4 21:22:25 MDT 2019
>>    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC
>> real mem = 4278181888 (4079MB)
>> avail mem = 4138524672 (3946MB)
>> mpath0 at root
>> scsibus0 at mpath0: 256 targets
>> mainbus0 at root
>> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf3f10 (12 entries)
>> bios0: vendor SeaBIOS version "1.11.0p2-OpenBSD-vmm" date 01/01/2011
>> bios0: OpenBSD VMM
>> acpi at bios0 not configured
>> cpu0 at mainbus0: (uniprocessor)
>> cpu0: AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx, 37466.79 MHz, 17-11-00
>> cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,CX8,SEP,PGE,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA
>> cpu0: 64KB 64b/line 4-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 8-way L2 cache, 4MB 64b/line 16-way L3 cache
>> cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
>> cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
>> pvbus0 at mainbus0: OpenBSD
>> pvclock0 at pvbus0
>> pci0 at mainbus0 bus 0
>> pchb0 at pci0 dev 0 function 0 "OpenBSD VMM Host" rev 0x00
>> virtio0 at pci0 dev 1 function 0 "Qumranet Virtio RNG" rev 0x00
>> viornd0 at virtio0
>> virtio0: irq 3
>> virtio1 at pci0 dev 2 function 0 "Qumranet Virtio Network" rev 0x00
>> vio0 at virtio1: address fe:e1:bb:d1:eb:4d
>> virtio1: irq 5
>> virtio2 at pci0 dev 3 function 0 "Qumranet Virtio Storage" rev 0x00
>> vioblk0 at virtio2
>> scsibus1 at vioblk0: 2 targets
>> sd0 at scsibus1 targ 0 lun 0: <VirtIO, Block Device, > SCSI3 0/direct fixed
>> sd0: 40960MB, 512 bytes/sector, 83886080 sectors
>> virtio2: irq 6
>> virtio3 at pci0 dev 4 function 0 "OpenBSD VMM Control" rev 0x00
>> vmmci0 at virtio3
>> virtio3: irq 7
>> isa0 at mainbus0
>> isadma0 at isa0
>> com0 at isa0 port 0x3f8/8 irq 4: ns8250, no fifo
>> com0: console
>> vscsi0 at root
>> scsibus2 at vscsi0: 256 targets
>> softraid0 at root
>> scsibus3 at softraid0: 256 targets
>> root on sd0a (66c460169c410440.a) swap on sd0b dump on sd0b
>> WARNING: / was not properly unmounted
Reply | Threaded
Open this post in threaded view
|

Re: Panic with pf 'once' rule

Lawrence Teo-7
On Tue, Jul 09, 2019 at 03:35:26PM -0600, Aaron Bieber wrote:

> I get the panic when the rule is in an anchor as well.
>
> PGP: 0x1F81112D62A9ADCE / 3586 3350 BFEA C101 DB1A 4AF0 1F81 112D 62A9 ADCE
>
> >> On Jul 7, 2019, at 4:39 PM, Mike Belopuhov <[hidden email]> wrote:
> > ???
> > Aaron Bieber writes:
> >
> >> Hi,
> >>
> >> Adding a rule similar to the below causes a panic on -current (OpenBSD
> >> 6.5-current (GENERIC) #95: Thu Jul  4 21:22:25 MDT 2019). This also panics 6.3
> >> and 6.5 (I didn't test 6.4):
> >>
> >>  pass in quick on egress proto tcp from any to port 8888 once rdr-to \
> >>    127.0.0.1 port 3333
> >>
> >> Once the rule is in place, fire up:
> >>
> >>  nc -l 127.0.0.1 3333
> >>
> >> Connect a few times from a remote machine:
> >>
> >>  nc <ip> 8888
> >>
> >> Eventually it will panic with (sometimes it happens right away, other times I
> >> have to restart nc a few times):
> >
> > This is because it's meant to be used inside of an anchor
> > (it removes the rule once it's matched).
> >
> > The most sensible way to use it is to put it into the anchor
> > inside a recursive anchor (e.g. 'relayd/*').
> >
> > It's possible that the check protecting the system from
> > the misuse like you've described here got lost during
> > refactoring or it never existed in the first place :-(
> >
> > Cheers,
> > Mike
> >
> >>  ddb> trace
> >>  pf_rm_rule(ffffffff81d900a8,ffff8000003bbfe8) at pf_rm_rule+0xa9
> >>  pf_purge_rule(ffff8000003bbfe8) at pf_purge_rule+0x26
> >>  pf_purge(ffffffff81dc1088) at pf_purge+0x55
> >>  taskq_thread(ffff800000022040) at taskq_thread+0x3d
> >>  end trace frame: 0x0, count: -4

[cc'ing sashan@ because this relates to pf.c r1.983]

I can replicate the crash as well.  The crash only happens if more than
one connection that matches the once rule is made in a short period of
time.  abieber@ triggered this when he ran "nc <ip> 8888" a few times.
After going through the code, I think I found the cause and wrote a
possible fix.

As of pf.c r1.983, once rules are removed by the purge thread instead of
immediately when a packet matches the rule.  The latest code is in
pf_test_rule():

        if (r->rule_flag & PFRULE_ONCE) {
                r->rule_flag |= PFRULE_EXPIRED;
                r->exptime = time_second;
                SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle);
        }

When a once rule is matched by a packet for the first time, the rule is
marked as expired and added to pf_rule_gcl.  Eventually the purge thread
is run, which calls pf_purge_expired_rules() to purge the expired rules
in pf_rule_gcl:

        while ((r = SLIST_FIRST(&pf_rule_gcl)) != NULL) {
                SLIST_REMOVE(&pf_rule_gcl, r, pf_rule, gcle);
                KASSERT(r->rule_flag & PFRULE_EXPIRED);
                pf_purge_rule(r);
        }

Since the purge thread only runs periodically, there is a short time
window where a once rule is expired but not yet purged.  If a second new
connection that matches the once rule is made in that time window, the
pf_test_rule() code block is executed again, causing the same once rule
to be added to pf_rule_gcl again.

As a result, when the purge thread finally runs, it tries to remove the
once rule multiple times, resulting in a crash, as seen on the
backtrace:

pf_rm_rule(ffffffff81d900a8,ffff8000003bbfe8) at pf_rm_rule+0xa9
pf_purge_rule(ffff8000003bbfe8) at pf_purge_rule+0x26
pf_purge(ffffffff81dc1088) at pf_purge+0x55
taskq_thread(ffff800000022040) at taskq_thread+0x3d

There is also a related bug where pf_test_rule() still creates state for
new connections that match the expired once rule in that time window,
even though the once rule has already expired.  In other words, the once
rule is not truly a one shot rule in that expired-but-not-purged time
window.

To fix both bugs, I wrote this diff that adds a check in pf_test_rule()
to prevent expired once rules from being added to pf_rule_gcl.  The
check is added "early" in pf_test_rule() to prevent any further
connections from creating state if they match the expired once rule.

I have sent this diff to abieber@ who confirmed that it fixes the crash
for him.

Thoughts? ok?


Index: pf.c
===================================================================
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1084
diff -u -p -r1.1084 pf.c
--- pf.c 9 Jul 2019 11:30:19 -0000 1.1084
+++ pf.c 12 Jul 2019 02:54:33 -0000
@@ -3862,6 +3862,13 @@ pf_test_rule(struct pf_pdesc *pd, struct
  if (r->action == PF_DROP)
  goto cleanup;
 
+ /*
+ * If an expired "once" rule has not been purged, drop any new matching
+ * packets to prevent the rule from being added to pf_rule_gcl again.
+ */
+ if (r->rule_flag & PFRULE_EXPIRED)
+ goto cleanup;
+
  pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid);
  if (ctx.act.rtableid >= 0 &&
     rtable_l2(ctx.act.rtableid) != pd->rdomain)

Reply | Threaded
Open this post in threaded view
|

Re: Panic with pf 'once' rule

Alexandr Nedvedicky
Hello,

</snip>

>
> To fix both bugs, I wrote this diff that adds a check in pf_test_rule()
> to prevent expired once rules from being added to pf_rule_gcl.  The
> check is added "early" in pf_test_rule() to prevent any further
> connections from creating state if they match the expired once rule.
>
> I have sent this diff to abieber@ who confirmed that it fixes the crash
> for him.
>
> Thoughts? ok?
>

    thanks for detailed analysis. it is correct.

>
> Index: pf.c
> ===================================================================
> RCS file: /cvs/src/sys/net/pf.c,v
> retrieving revision 1.1084
> diff -u -p -r1.1084 pf.c
> --- pf.c 9 Jul 2019 11:30:19 -0000 1.1084
> +++ pf.c 12 Jul 2019 02:54:33 -0000
> @@ -3862,6 +3862,13 @@ pf_test_rule(struct pf_pdesc *pd, struct
>   if (r->action == PF_DROP)
>   goto cleanup;
>  
> + /*
> + * If an expired "once" rule has not been purged, drop any new matching
> + * packets to prevent the rule from being added to pf_rule_gcl again.
> + */
> + if (r->rule_flag & PFRULE_EXPIRED)
> + goto cleanup;
> +
>   pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid);
>   if (ctx.act.rtableid >= 0 &&
>      rtable_l2(ctx.act.rtableid) != pd->rdomain)

    your change looks good and makes sense. However I would ask for one more
    tweak to your fix:

--------8<---------------8<---------------8<------------------8<--------
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 9e454e5c941..df75d7e4acf 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -3860,6 +3860,13 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm,
        if (r->action == PF_DROP)
                goto cleanup;
 
+       /*
+        * If an expired "once" rule has not been purged, drop any new matching
+        * packets.
+        */
+       if (r->rule_flag & PFRULE_EXPIRED)
+               goto cleanup;
+
        pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid);
        if (ctx.act.rtableid >= 0 &&
            rtable_l2(ctx.act.rtableid) != pd->rdomain)
@@ -3945,9 +3952,18 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm,
 #endif /* NPFSYNC > 0 */
 
        if (r->rule_flag & PFRULE_ONCE) {
-               r->rule_flag |= PFRULE_EXPIRED;
-               r->exptime = time_second;
-               SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle);
+               u_int32_t       rule_flag;
+
+               /*
+                * Use atomic_cas() to determine a clear winner, which will
+                * insert an expired rule to gcl.
+                */
+               rule_flag = r->rule_flag;
+               if (atomic_cas_uint(&r->rule_flag, rule_flag,
+                   rule_flag | PFRULE_EXPIRED) == rule_flag) {
+                       r->exptime = time_second;
+                       SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle);
+               }
        }
 
        return (action);
--------8<---------------8<---------------8<------------------8<--------

    As soon as there will be more PF tasks running in parallel, we would be
    able to hit similar crash you are fixing now. The rules are covered by
    read lock, so with more PF tasks there might be two packets racing
    to expire at once rule at the same time. Using atomic_cas() is sufficient
    measure to serialize competing packets.

    you have my OK with/without the suggested tweak.

thanks and
regards
sashan

Reply | Threaded
Open this post in threaded view
|

Re: Panic with pf 'once' rule

Lawrence Teo-7
On Mon, Jul 15, 2019 at 11:06:50AM +0200, Alexandr Nedvedicky wrote:

>     your change looks good and makes sense. However I would ask for one more
>     tweak to your fix:
>
> --------8<---------------8<---------------8<------------------8<--------
> diff --git a/sys/net/pf.c b/sys/net/pf.c
> index 9e454e5c941..df75d7e4acf 100644
> --- a/sys/net/pf.c
> +++ b/sys/net/pf.c
> @@ -3860,6 +3860,13 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm,
>         if (r->action == PF_DROP)
>                 goto cleanup;
>  
> +       /*
> +        * If an expired "once" rule has not been purged, drop any new matching
> +        * packets.
> +        */
> +       if (r->rule_flag & PFRULE_EXPIRED)
> +               goto cleanup;
> +
>         pf_tag_packet(pd->m, ctx.tag, ctx.act.rtableid);
>         if (ctx.act.rtableid >= 0 &&
>             rtable_l2(ctx.act.rtableid) != pd->rdomain)
> @@ -3945,9 +3952,18 @@ pf_test_rule(struct pf_pdesc *pd, struct pf_rule **rm, struct pf_state **sm,
>  #endif /* NPFSYNC > 0 */
>  
>         if (r->rule_flag & PFRULE_ONCE) {
> -               r->rule_flag |= PFRULE_EXPIRED;
> -               r->exptime = time_second;
> -               SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle);
> +               u_int32_t       rule_flag;
> +
> +               /*
> +                * Use atomic_cas() to determine a clear winner, which will
> +                * insert an expired rule to gcl.
> +                */
> +               rule_flag = r->rule_flag;
> +               if (atomic_cas_uint(&r->rule_flag, rule_flag,
> +                   rule_flag | PFRULE_EXPIRED) == rule_flag) {
> +                       r->exptime = time_second;
> +                       SLIST_INSERT_HEAD(&pf_rule_gcl, r, gcle);
> +               }
>         }
>  
>         return (action);
> --------8<---------------8<---------------8<------------------8<--------
>
>     As soon as there will be more PF tasks running in parallel, we would be
>     able to hit similar crash you are fixing now. The rules are covered by
>     read lock, so with more PF tasks there might be two packets racing
>     to expire at once rule at the same time. Using atomic_cas() is sufficient
>     measure to serialize competing packets.
>
>     you have my OK with/without the suggested tweak.

Thank you!  I have committed the fix with your tweak.

Lawrence