Performance impact of PF on APU2

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance impact of PF on APU2

Benjamin Petit
I am trying to setup a PC Engines APU2C2 as a router using OpenBSD. Using the latest snapshots of CURRENT, with pf disabled, it seems capable to route at near gigabit speeds, but when enabling pf (with the default config file), I cannot get a bandwidth of more than 450/440Mbits/s between the two segments of my LAN. That seems to be a huge drop using default rules. Enabling NAT doesn't seems to drop performance more.

Before upgrading to CURRENT, I think routing with or without pf enabled was around 600Mbit/s, but I would need to reinstall to test again.

Method of test: iperf3 between a PC in network 192.168.1.0/24 and another in network 192.168.42.0/24. I try with 1 connection, 10 connections and then 20 connections. I know this is not a perfect routing test, but that's what I can easily test for now.

I am an OpenBSD newbie, so I am not sure where to look to see the bottleneck. I know that the APU2 is not very powerful, but I expected a bit more than that, with simple pf rules.

Thanks,

Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Zbyszek Żółkiewski
it was discussed before in this mailing list. There is ongoing effort to make pf more performant on multicore setup (what I understand).
There are more impacts like queue causes 100mbps drops in processing speed on 1Gbps link, etc...

Zbyszek

> Wiadomość napisana przez Benjamin Petit <[hidden email]> w dniu 03.10.2018, o godz. 06:13:
>
> I am trying to setup a PC Engines APU2C2 as a router using OpenBSD. Using the latest snapshots of CURRENT, with pf disabled, it seems capable to route at near gigabit speeds, but when enabling pf (with the default config file), I cannot get a bandwidth of more than 450/440Mbits/s between the two segments of my LAN. That seems to be a huge drop using default rules. Enabling NAT doesn't seems to drop performance more.
>
> Before upgrading to CURRENT, I think routing with or without pf enabled was around 600Mbit/s, but I would need to reinstall to test again.
>
> Method of test: iperf3 between a PC in network 192.168.1.0/24 and another in network 192.168.42.0/24. I try with 1 connection, 10 connections and then 20 connections. I know this is not a perfect routing test, but that's what I can easily test for now.
>
> I am an OpenBSD newbie, so I am not sure where to look to see the bottleneck. I know that the APU2 is not very powerful, but I expected a bit more than that, with simple pf rules.
>
> Thanks,
>

smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Benjamin Petit
In reply to this post by Benjamin Petit
Thanks, I just saw the previous discussion, from late 2017.

Do you know where we can follow the work that is being done? I would be more than
happy to test early version.

Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Tom Smyth
Hello,

your forwarding performance will vary based on a few things...
at the minute Routing is MP safe... but if one of the lan ports
lets say em1 was in a bridge...  then the forwarding is  done
by a single core...

My testing on OpenBSD 6.3  showed speeds of 750/s - 800Mb/s
with default rules using    x86-64 GENERIC (not i386)

speeds generally fell when playing with Encapsulation..
I was using a test rig as follows

apuc2iperfclient ----- <apuc2-router>------ apuc2iperf server

I hope this helps

TomSmyth
On Wed, 3 Oct 2018 at 19:04, Benjamin Petit <[hidden email]> wrote:
>
> Thanks, I just saw the previous discussion, from late 2017.
>
> Do you know where we can follow the work that is being done? I would be more than
> happy to test early version.
>


--
Kindest regards,
Tom Smyth

Mobile: +353 87 6193172
The information contained in this E-mail is intended only for the
confidential use of the named recipient. If the reader of this message
is not the intended recipient or the person responsible for
delivering it to the recipient, you are hereby notified that you have
received this communication in error and that any review,
dissemination or copying of this communication is strictly prohibited.
If you have received this in error, please notify the sender
immediately by telephone at the number above and erase the message
You are requested to carry out your own virus check before
opening any attachment.

Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Stuart Henderson
In reply to this post by Benjamin Petit
On 2018-10-03, Benjamin Petit <[hidden email]> wrote:
> Before upgrading to CURRENT, I think routing with or without pf
> enabled was around 600Mbit/s, but I would need to reinstall to test
> again.

Snapshots are usually built with the "pool_debug" kernel option,
releases are built without it. This is good for finding some types of
bug, but can have an impact, you could try sysctl kern.pool_debug=0
and see if that improves performance.

What were you running before (including syspatches if present)?
If it was from before mitigations for CPU bugs were added, those are
generally expected to slow things down.


Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Benjamin Petit
Hello, and thanks for your responses!

> My testing on OpenBSD 6.3  showed speeds of 750/s - 800Mb/s
> with default rules using    x86-64 GENERIC (not i386)

Same setup as yours, and I definitely don't reach 750-800Mbits/s (550 at best)

When I transfer a big file from one network to another, I clearly see that
one core stays pretty much at 100%. I don't have any bridge configured

> Snapshots are usually built with the "pool_debug" kernel option,
> releases are built without it. This is good for finding some types of
> bug, but can have an impact, you could try sysctl kern.pool_debug=0
> and see if that improves performance.

No significant impact

> What were you running before (including syspatches if present)?
> If it was from before mitigations for CPU bugs were added, those are
> generally expected to slow things down.

All syspatches up-to-date (until Monday at least). BIOS is up-to-date
so I suppose that mitigations were already in place?

I will reinstall 6.3+latest syspatches and measure again.

Currently I see that sometimes iperf3 needs to retries to send some
packets. I don't see any dropped packets in sysctl.



Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Benjamin Petit
Ok so I compared 6.3-release, 6.3-release+syspatches(=stable?) and the latest snapshot from October 2.

I measured iperf3 throughput between A and B, like this:
PC A <---> APU2 <---> PC B

pf rules are the one shipped by default in 6.3:

  gw# pfctl -sr                                                                  
  block return all
  pass all flags S/SA
  block return in on ! lo0 proto tcp from any to any port 6000:6010
  block return out log proto tcp all user = 55
  block return out log proto udp all user = 55

OpenBSD 6.3 RELEASE:  
  - pf enabled:  841 Mbits/sec 
  - pf disabled: 935 Mbits/sec

OpenBSD 6.3 + Syspatch:
  - pf enabled:  803 Mbits/sec
  - pf disabled: 936 Mbits/sec

OpenBSD CURRENT:
  - pf enabled: 526 Mbits/sec (541 with kern.pool_debug=0)
  - pf disabled: 934 Mbits/sec

So there is a small perf drop when applying all syspatches to 6.3 (not sure which one cause the drop),
but the performance drop SIGNIFICANTLY using the latest snapshot.

Am I missing something? (I really hope I am)

Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Tom Smyth
can you show us a copy of your sysctl output?

check if smt is disabled ...  (Hyper Threading )

Im not sure if this would have an effect on the
APU2C2 ...  but worth checking as it is a change
in behaviour between 6.3 and current AFIK

Thanks

Tom Smyth
On Thu, 4 Oct 2018 at 04:58, Benjamin Petit <[hidden email]> wrote:

>
> Ok so I compared 6.3-release, 6.3-release+syspatches(=stable?) and the latest snapshot from October 2.
>
> I measured iperf3 throughput between A and B, like this:
> PC A <---> APU2 <---> PC B
>
> pf rules are the one shipped by default in 6.3:
>
>   gw# pfctl -sr
>   block return all
>   pass all flags S/SA
>   block return in on ! lo0 proto tcp from any to any port 6000:6010
>   block return out log proto tcp all user = 55
>   block return out log proto udp all user = 55
>
> OpenBSD 6.3 RELEASE:
>   - pf enabled:  841 Mbits/sec
>   - pf disabled: 935 Mbits/sec
>
> OpenBSD 6.3 + Syspatch:
>   - pf enabled:  803 Mbits/sec
>   - pf disabled: 936 Mbits/sec
>
> OpenBSD CURRENT:
>   - pf enabled: 526 Mbits/sec (541 with kern.pool_debug=0)
>   - pf disabled: 934 Mbits/sec
>
> So there is a small perf drop when applying all syspatches to 6.3 (not sure which one cause the drop),
> but the performance drop SIGNIFICANTLY using the latest snapshot.
>
> Am I missing something? (I really hope I am)



--
Kindest regards,
Tom Smyth

Mobile: +353 87 6193172
The information contained in this E-mail is intended only for the
confidential use of the named recipient. If the reader of this message
is not the intended recipient or the person responsible for
delivering it to the recipient, you are hereby notified that you have
received this communication in error and that any review,
dissemination or copying of this communication is strictly prohibited.
If you have received this in error, please notify the sender
immediately by telephone at the number above and erase the message
You are requested to carry out your own virus check before
opening any attachment.

Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Benjamin Petit
Added my sysctl output in attachement (I never really used
distribution lists before...)

I don't think the APU2 uses HT, but I tried with sysctl hw.smt=1,
no difference in the iperf3 numbers.

On Thu, 2018-10-04 at 05:02 +0100, Tom Smyth wrote:

> can you show us a copy of your sysctl output?
>
> check if smt is disabled ...  (Hyper Threading )
>
> Im not sure if this would have an effect on the
> APU2C2 ...  but worth checking as it is a change
> in behaviour between 6.3 and current AFIK
>
> Thanks
>
> Tom Smyth
> On Thu, 4 Oct 2018 at 04:58, Benjamin Petit <[hidden email]> wrote:
> > Ok so I compared 6.3-release, 6.3-release+syspatches(=stable?) and
> > the latest snapshot from October 2.
> >
> > I measured iperf3 throughput between A and B, like this:
> > PC A <---> APU2 <---> PC B
> >
> > pf rules are the one shipped by default in 6.3:
> >
> >   gw# pfctl -sr
> >   block return all
> >   pass all flags S/SA
> >   block return in on ! lo0 proto tcp from any to any port 6000:6010
> >   block return out log proto tcp all user = 55
> >   block return out log proto udp all user = 55
> >
> > OpenBSD 6.3 RELEASE:
> >   - pf enabled:  841 Mbits/sec
> >   - pf disabled: 935 Mbits/sec
> >
> > OpenBSD 6.3 + Syspatch:
> >   - pf enabled:  803 Mbits/sec
> >   - pf disabled: 936 Mbits/sec
> >
> > OpenBSD CURRENT:
> >   - pf enabled: 526 Mbits/sec (541 with kern.pool_debug=0)
> >   - pf disabled: 934 Mbits/sec
> >
> > So there is a small perf drop when applying all syspatches to 6.3
> > (not sure which one cause the drop),
> > but the performance drop SIGNIFICANTLY using the latest snapshot.
> >
> > Am I missing something? (I really hope I am)
>
>

sysctl.txt (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Benjamin Petit
In reply to this post by Tom Smyth
My sysctl output:

kern.ostype=OpenBSD
kern.osrelease=6.4
kern.osrevision=201811
kern.version=OpenBSD 6.4 (GENERIC.MP) #342: Tue Oct  2 23:23:09 MDT
2018
    [hidden email]:/usr/src/sys/arch/amd64/compile/GENERIC.M
P

kern.maxvnodes=22282
kern.maxproc=1310
kern.maxfiles=7030
kern.argmax=262144
kern.securelevel=1
kern.hostname=gw.home.******
kern.hostid=0
kern.clockrate=tick = 10000, tickadj = 40, hz = 100, profhz = 100,
stathz = 100
kern.dnsjackport=0
kern.posix1version=200809
kern.ngroups=16
kern.job_control=1
kern.saved_ids=1
kern.boottime=Wed Oct  3 20:45:34 2018
kern.domainname=
kern.maxpartitions=16
kern.rawpartition=2
kern.maxthread=1950
kern.nthreads=48
kern.osversion=GENERIC.MP#342
kern.somaxconn=128
kern.sominconn=80
kern.nosuidcoredump=1
kern.fsync=1
kern.sysvmsg=1
kern.sysvsem=1
kern.sysvshm=1
kern.msgbufsize=98256
kern.malloc.buckets=16,32,64,128,256,512,1024,2048,4096,8192,16384,3276
8,65536,131072,262144,524288
kern.malloc.bucket.16=(calls = 1722 total_allocated = 768 total_free =
393 elements = 256 high watermark = 1280 could_free = 0)
kern.malloc.bucket.32=(calls = 4386 total_allocated = 896 total_free =
623 elements = 128 high watermark = 640 could_free = 0)
kern.malloc.bucket.64=(calls = 7536 total_allocated = 1472 total_free =
1113 elements = 64 high watermark = 320 could_free = 876)
kern.malloc.bucket.128=(calls = 9964 total_allocated = 3296 total_free
= 27 elements = 32 high watermark = 160 could_free = 3)
kern.malloc.bucket.256=(calls = 4129 total_allocated = 112 total_free =
6 elements = 16 high watermark = 80 could_free = 0)
kern.malloc.bucket.512=(calls = 1621 total_allocated = 144 total_free =
6 elements = 8 high watermark = 40 could_free = 0)
kern.malloc.bucket.1024=(calls = 2460 total_allocated = 172 total_free
= 5 elements = 4 high watermark = 20 could_free = 55)
kern.malloc.bucket.2048=(calls = 58 total_allocated = 36 total_free = 0
elements = 2 high watermark = 10 could_free = 0)
kern.malloc.bucket.4096=(calls = 2780 total_allocated = 1047 total_free
= 2 elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.8192=(calls = 394 total_allocated = 216 total_free =
3 elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.16384=(calls = 372 total_allocated = 5 total_free =
0 elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.32768=(calls = 10 total_allocated = 9 total_free = 0
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.65536=(calls = 1540 total_allocated = 3 total_free =
0 elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.131072=(calls = 3 total_allocated = 3 total_free = 0
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.262144=(calls = 0 total_allocated = 0 total_free = 0
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.524288=(calls = 1 total_allocated = 1 total_free = 0
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.kmemnames=free,,devbuf,,pcb,rtable,,,,ifaddr,soopts,sysctl,
counters,,ioctlops,,,,,iov,mount,,NFS_req,NFS_mount,,vnodes,namecache,U
FS_quota,UFS_mount,shm,VM_map,sem,dirhash,ACPI,VM_pmap,,,,file,file_des
c,,proc,subproc,VFS_cluster,,,MFS_node,,,Export_Host,NFS_srvsock,,NFS_d
aemon,ip_moptions,in_multi,ether_multi,mrt,ISOFS_mount,ISOFS_node,MSDOS
FS_mount,MSDOSFS_fat,MSDOSFS_node,ttys,exec,miscfs_mount,fusefs_mount,,
,,,,,,,pfkey_data,tdb,xform_data,,pagedep,inodedep,newblk,,,indirdep,,,
,,,,,,VM_swap,,,,,,UVM_amap,UVM_aobj,,USB,USB_device,USB_HC,,memdesc,,,
crypto_data,,IPsec_creds,,,,emuldata,,,,,,,,,ip6_options,NDP,,,temp,NTF
S_mount,NTFS_node,NTFS_fnode,NTFS_dir,NTFS_hash,NTFS_attr,NTFS_data,NTF
S_decomp,NTFS_vrun,kqueue,,SYN_cache,UDF_mount,UDF_file_entry,UDF_file_
id,,AGP_Memory,DRM
kern.malloc.kmemstat.free=(inuse = 0, calls = 0, memuse = 0K, limblocks
= 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes =
(none))
kern.malloc.kmemstat.devbuf=(inuse = 2071, calls = 3032, memuse =
4663K, limblocks = 0, mapblocks = 0, maxused = 4664K, limit = 78644K,
spare = 0, sizes =
(16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072))
kern.malloc.kmemstat.pcb=(inuse = 76, calls = 114, memuse = 16K,
limblocks = 0, mapblocks = 0, maxused = 17K, limit = 78644K, spare = 0,
sizes = (16,32,128,1024))
kern.malloc.kmemstat.rtable=(inuse = 71, calls = 157, memuse = 3K,
limblocks = 0, mapblocks = 0, maxused = 3K, limit = 78644K, spare = 0,
sizes = (16,32,64,128,256))
kern.malloc.kmemstat.ifaddr=(inuse = 52, calls = 53, memuse = 11K,
limblocks = 0, mapblocks = 0, maxused = 11K, limit = 78644K, spare = 0,
sizes = (32,64,128,256,4096))
kern.malloc.kmemstat.soopts=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.sysctl=(inuse = 3, calls = 3, memuse = 2K,
limblocks = 0, mapblocks = 0, maxused = 2K, limit = 78644K, spare = 0,
sizes = (32,128,1024))
kern.malloc.kmemstat.counters=(inuse = 79, calls = 79, memuse = 67K,
limblocks = 0, mapblocks = 0, maxused = 67K, limit = 78644K, spare = 0,
sizes = (64,128,256,512,1024,4096,8192))
kern.malloc.kmemstat.ioctlops=(inuse = 0, calls = 2216, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 4K, limit = 78644K, spare = 0,
sizes = (256,512,1024,2048,4096))
kern.malloc.kmemstat.iov=(inuse = 0, calls = 0, memuse = 0K, limblocks
= 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes =
(none))
kern.malloc.kmemstat.mount=(inuse = 9, calls = 9, memuse = 9K,
limblocks = 0, mapblocks = 0, maxused = 9K, limit = 78644K, spare = 0,
sizes = (1024))
kern.malloc.kmemstat.NFS_req=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NFS_mount=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.vnodes=(inuse = 35, calls = 1188, memuse = 3K,
limblocks = 0, mapblocks = 0, maxused = 74K, limit = 78644K, spare = 0,
sizes = (64,128,256))
kern.malloc.kmemstat.namecache=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.UFS_quota=(inuse = 1, calls = 1, memuse = 32K,
limblocks = 0, mapblocks = 0, maxused = 32K, limit = 78644K, spare = 0,
sizes = (32768))
kern.malloc.kmemstat.UFS_mount=(inuse = 37, calls = 37, memuse = 80K,
limblocks = 0, mapblocks = 0, maxused = 80K, limit = 78644K, spare = 0,
sizes = (16,32,64,512,2048,8192,32768))
kern.malloc.kmemstat.shm=(inuse = 2, calls = 2, memuse = 2K, limblocks
= 0, mapblocks = 0, maxused = 2K, limit = 78644K, spare = 0, sizes =
(256,1024))
kern.malloc.kmemstat.VM_map=(inuse = 2, calls = 2, memuse = 1K,
limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0,
sizes = (256))
kern.malloc.kmemstat.sem=(inuse = 2, calls = 2, memuse = 1K, limblocks
= 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes =
(32,128))
kern.malloc.kmemstat.dirhash=(inuse = 90, calls = 117, memuse = 19K,
limblocks = 0, mapblocks = 0, maxused = 19K, limit = 78644K, spare = 0,
sizes = (16,32,64,128,256,512))
kern.malloc.kmemstat.ACPI=(inuse = 2590, calls = 7506, memuse = 304K,
limblocks = 0, mapblocks = 0, maxused = 325K, limit = 78644K, spare =
0, sizes = (16,32,64,128,256,512,1024,2048))
kern.malloc.kmemstat.VM_pmap=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.file=(inuse = 0, calls = 0, memuse = 0K, limblocks
= 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes =
(none))
kern.malloc.kmemstat.file_desc=(inuse = 1, calls = 1, memuse = 1K,
limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0,
sizes = (512))
kern.malloc.kmemstat.proc=(inuse = 61, calls = 404, memuse = 48K,
limblocks = 0, mapblocks = 0, maxused = 72K, limit = 78644K, spare = 0,
sizes = (16,64,1024,4096,8192))
kern.malloc.kmemstat.subproc=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.VFS_cluster=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.MFS_node=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.Export_Host=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NFS_srvsock=(inuse = 1, calls = 1, memuse = 1K,
limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0,
sizes = (128))
kern.malloc.kmemstat.NFS_daemon=(inuse = 1, calls = 1, memuse = 16K,
limblocks = 0, mapblocks = 0, maxused = 16K, limit = 78644K, spare = 0,
sizes = (16384))
kern.malloc.kmemstat.ip_moptions=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.in_multi=(inuse = 15, calls = 15, memuse = 1K,
limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0,
sizes = (32,64,128))
kern.malloc.kmemstat.ether_multi=(inuse = 2, calls = 2, memuse = 1K,
limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0,
sizes = (32))
kern.malloc.kmemstat.mrt=(inuse = 0, calls = 0, memuse = 0K, limblocks
= 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes =
(none))
kern.malloc.kmemstat.ISOFS_mount=(inuse = 1, calls = 1, memuse = 32K,
limblocks = 0, mapblocks = 0, maxused = 32K, limit = 78644K, spare = 0,
sizes = (32768))
kern.malloc.kmemstat.ISOFS_node=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.MSDOSFS_mount=(inuse = 1, calls = 1, memuse = 16K,
limblocks = 0, mapblocks = 0, maxused = 16K, limit = 78644K, spare = 0,
sizes = (16384))
kern.malloc.kmemstat.MSDOSFS_fat=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.MSDOSFS_node=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.ttys=(inuse = 390, calls = 390, memuse = 1723K,
limblocks = 0, mapblocks = 0, maxused = 1723K, limit = 78644K, spare =
0, sizes = (512,1024,8192))
kern.malloc.kmemstat.exec=(inuse = 0, calls = 1022, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 2K, limit = 78644K, spare = 0,
sizes = (16,32,256,1024))
kern.malloc.kmemstat.miscfs_mount=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.fusefs_mount=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.pfkey_data=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.tdb=(inuse = 0, calls = 0, memuse = 0K, limblocks
= 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes =
(none))
kern.malloc.kmemstat.xform_data=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.pagedep=(inuse = 1, calls = 1, memuse = 8K,
limblocks = 0, mapblocks = 0, maxused = 8K, limit = 78644K, spare = 0,
sizes = (8192))
kern.malloc.kmemstat.inodedep=(inuse = 1, calls = 1, memuse = 32K,
limblocks = 0, mapblocks = 0, maxused = 32K, limit = 78644K, spare = 0,
sizes = (32768))
kern.malloc.kmemstat.newblk=(inuse = 1, calls = 1, memuse = 1K,
limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0,
sizes = (512))
kern.malloc.kmemstat.indirdep=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.VM_swap=(inuse = 7, calls = 7, memuse = 159K,
limblocks = 0, mapblocks = 0, maxused = 159K, limit = 78644K, spare =
0, sizes = (16,64,2048,131072))
kern.malloc.kmemstat.UVM_amap=(inuse = 219, calls = 6116, memuse = 11K,
limblocks = 0, mapblocks = 0, maxused = 31K, limit = 78644K, spare = 0,
sizes = (16,32,64,128,256,8192))
kern.malloc.kmemstat.UVM_aobj=(inuse = 2, calls = 2, memuse = 3K,
limblocks = 0, mapblocks = 0, maxused = 3K, limit = 78644K, spare = 0,
sizes = (16,2048))
kern.malloc.kmemstat.USB=(inuse = 84, calls = 95, memuse = 47K,
limblocks = 0, mapblocks = 0, maxused = 47K, limit = 78644K, spare = 0,
sizes = (16,32,64,128,256,2048,4096,8192))
kern.malloc.kmemstat.USB_device=(inuse = 20, calls = 20, memuse = 2K,
limblocks = 0, mapblocks = 0, maxused = 2K, limit = 78644K, spare = 0,
sizes = (16,32,128,256))
kern.malloc.kmemstat.USB_HC=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.memdesc=(inuse = 1, calls = 1, memuse = 4K,
limblocks = 0, mapblocks = 0, maxused = 4K, limit = 78644K, spare = 0,
sizes = (4096))
kern.malloc.kmemstat.crypto_data=(inuse = 1, calls = 1, memuse = 1K,
limblocks = 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0,
sizes = (1024))
kern.malloc.kmemstat.IPsec_creds=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.emuldata=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.ip6_options=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NDP=(inuse = 6, calls = 6, memuse = 1K, limblocks
= 0, mapblocks = 0, maxused = 1K, limit = 78644K, spare = 0, sizes =
(32))
kern.malloc.kmemstat.temp=(inuse = 64, calls = 14367, memuse = 2319K,
limblocks = 0, mapblocks = 0, maxused = 2383K, limit = 78644K, spare =
0, sizes =
(16,32,64,128,256,512,1024,2048,4096,8192,16384,65536,524288))
kern.malloc.kmemstat.NTFS_mount=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NTFS_node=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NTFS_fnode=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NTFS_dir=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NTFS_hash=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NTFS_attr=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NTFS_data=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NTFS_decomp=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.NTFS_vrun=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.kqueue=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.SYN_cache=(inuse = 2, calls = 2, memuse = 16K,
limblocks = 0, mapblocks = 0, maxused = 16K, limit = 78644K, spare = 0,
sizes = (8192))
kern.malloc.kmemstat.UDF_mount=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.UDF_file_entry=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.UDF_file_id=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.AGP_Memory=(inuse = 0, calls = 0, memuse = 0K,
limblocks = 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0,
sizes = (none))
kern.malloc.kmemstat.DRM=(inuse = 0, calls = 0, memuse = 0K, limblocks
= 0, mapblocks = 0, maxused = 0K, limit = 78644K, spare = 0, sizes =
(none))
kern.cp_time=688,0,1174,130,199,121719
kern.nchstats.good_hits=48313
kern.nchstats.negative_hits=2583
kern.nchstats.bad_hits=1896
kern.nchstats.false_hits=2
kern.nchstats.misses=25359
kern.nchstats.long_names=134
kern.nchstats.pass2=1260
kern.nchstats.2passes=1507
kern.nchstats.ncs_revhits=24
kern.nchstats.ncs_revmiss=0
kern.nchstats.ncs_dothits=0
kern.nchstats.nch_dotdothits=0
kern.forkstat.forks=1074
kern.forkstat.vforks=4
kern.forkstat.tforks=0
kern.forkstat.kthreads=21
kern.forkstat.fork_pages=174829
kern.forkstat.vfork_pages=61
kern.forkstat.tfork_pages=0
kern.forkstat.kthread_pages=0
kern.nselcoll=0
kern.tty.tk_nin=81
kern.tty.tk_nout=29036
kern.tty.tk_rawcc=74
kern.tty.tk_cancc=7
kern.ccpu=1948
kern.fscale=2048
kern.nprocs=48
kern.stackgap_random=262144
kern.allowkmem=0
kern.splassert=1
kern.nfiles=119
kern.ttycount=65
kern.numvnodes=7982
kern.seminfo.semmni=10
kern.seminfo.semmns=60
kern.seminfo.semmnu=30
kern.seminfo.semmsl=60
kern.seminfo.semopm=100
kern.seminfo.semume=10
kern.seminfo.semusz=112
kern.seminfo.semvmx=32767
kern.seminfo.semaem=16384
kern.shminfo.shmmax=33554432
kern.shminfo.shmmin=1
kern.shminfo.shmmni=128
kern.shminfo.shmseg=128
kern.shminfo.shmall=8192
kern.maxclusters=262144
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=tsc
kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000)
acpitimer0(1000) dummy(-1000000)
kern.maxlocksperuid=1024
kern.bufcachepercent=20
kern.wxabort=0
kern.consdev=tty00
kern.netlivelocks=20
kern.pool_debug=0
kern.global_ptrace=0
kern.audio.record=0
vm.loadavg=0.07 0.02 0.01
vm.psstrings=0x7f7ffffd8b00
vm.swapencrypt.enable=1
vm.swapencrypt.keyscreated=0
vm.swapencrypt.keysdeleted=0
vm.nkmempages=32768
vm.anonmin=10
vm.vtextmin=5
vm.vnodemin=10
fs.posix.setuid=1
net.inet.ip.forwarding=1
net.inet.ip.redirect=1
net.inet.ip.ttl=64
net.inet.ip.sourceroute=0
net.inet.ip.directed-broadcast=0
net.inet.ip.portfirst=1024
net.inet.ip.portlast=49151
net.inet.ip.porthifirst=49152
net.inet.ip.porthilast=65535
net.inet.ip.maxqueue=300
net.inet.ip.encdebug=0
net.inet.ip.ipsec-expire-acquire=30
net.inet.ip.ipsec-invalid-life=60
net.inet.ip.ipsec-pfs=1
net.inet.ip.ipsec-soft-allocs=0
net.inet.ip.ipsec-allocs=0
net.inet.ip.ipsec-soft-bytes=0
net.inet.ip.ipsec-bytes=0
net.inet.ip.ipsec-timeout=86400
net.inet.ip.ipsec-soft-timeout=80000
net.inet.ip.ipsec-soft-firstuse=3600
net.inet.ip.ipsec-firstuse=7200
net.inet.ip.ipsec-enc-alg=aes
net.inet.ip.ipsec-auth-alg=hmac-sha1
net.inet.ip.mtudisc=1
net.inet.ip.mtudisctimeout=600
net.inet.ip.ipsec-comp-alg=deflate
net.inet.ip.ifq.len=0
net.inet.ip.ifq.maxlen=2048
net.inet.ip.ifq.drops=0
net.inet.ip.mforwarding=0
net.inet.ip.multipath=0
net.inet.ip.mrtproto=19
net.inet.ip.arpqueued=0
net.inet.ip.arptimeout=1200
net.inet.ip.arpdown=20
net.inet.icmp.maskrepl=0
net.inet.icmp.bmcastecho=0
net.inet.icmp.errppslimit=100
net.inet.icmp.rediraccept=0
net.inet.icmp.redirtimeout=600
net.inet.icmp.tstamprepl=1
net.inet.ipip.allow=0
net.inet.tcp.rfc1323=1
net.inet.tcp.keepinittime=150
net.inet.tcp.keepidle=14400
net.inet.tcp.keepintvl=150
net.inet.tcp.slowhz=2
net.inet.tcp.baddynamic=1,7,9,11,13,15,17,18,19,20,21,22,23,25,37,42,43
,49,53,57,67,68,70,77,79,80,87,88,95,101,102,103,104,105,106,107,109,11
0,111,113,115,117,119,123,129,135,137,138,139,143,152,163,164,177,178,1
79,191,194,199,201,202,204,206,210,213,220,372,389,427,433,443,444,445,
464,465,468,512,513,514,515,521,526,530,531,532,540,543,544,545,548,554
,556,587,631,636,646,706,749,750,751,754,760,871,873,888,901,993,995,10
80,1109,1127,1433,1434,1524,1525,1529,1723,1900,2049,2105,2106,2108,211
0,2111,2112,2120,2121,2401,2600,2601,2602,2603,2604,2605,2606,2607,2608
,2627,2983,3031,3109,3260,3306,3389,3517,3689,3690,4190,4444,4500,4559,
5002,5060,5222,5269,5280,5298,5353,5354,5432,5680,5900,6000,6001,6002,6
003,6004,6005,6006,6007,6008,6009,6010,6514,6566,7000,7001,7002,7003,70
04,7005,7006,7007,7008,7009,7326,8025,8026,8140,8953,9418,10050,10051,1
6992,16993,16994,16995,20005
net.inet.tcp.sack=1
net.inet.tcp.mssdflt=512
net.inet.tcp.rstppslimit=100
net.inet.tcp.ackonpush=0
net.inet.tcp.ecn=0
net.inet.tcp.syncachelimit=10255
net.inet.tcp.synbucketlimit=105
net.inet.tcp.rfc3390=2
net.inet.tcp.reasslimit=32768
net.inet.tcp.sackholelimit=32768
net.inet.tcp.always_keepalive=0
net.inet.tcp.synuselimit=100000
net.inet.tcp.rootonly=2049
net.inet.tcp.synhashsize=293
net.inet.udp.checksum=1
net.inet.udp.baddynamic=7,9,13,18,19,22,37,39,49,53,67,68,69,70,80,88,1
05,107,109,110,111,123,129,135,137,138,139,143,161,162,163,164,177,178,
179,191,194,199,201,202,204,206,210,213,220,372,389,427,444,445,464,468
,500,512,513,514,517,518,520,525,533,546,547,548,554,587,623,631,636,64
6,664,706,749,750,751,993,995,1433,1434,1524,1525,1645,1646,1701,1723,1
812,1813,1900,2049,2401,3031,3517,3689,3784,3785,4190,4444,4500,4559,47
54,4755,4789,5002,5060,5298,5353,5354,5432,7000,7001,7002,7003,7004,700
5,7006,7007,7008,7009,7784,8025,8067,9418,10050,10051,16992,16993,16994
,16995,20005,26740
net.inet.udp.recvspace=41600
net.inet.udp.sendspace=9216
net.inet.udp.rootonly=2049
net.inet.gre.allow=0
net.inet.gre.wccp=0
net.inet.esp.enable=1
net.inet.esp.udpencap=1
net.inet.esp.udpencap_port=4500
net.inet.ah.enable=1
net.inet.mobileip.allow=0
net.inet.etherip.allow=0
net.inet.ipcomp.enable=0
net.inet.carp.allow=1
net.inet.carp.preempt=0
net.inet.carp.log=2
net.inet.divert.recvspace=65636
net.inet.divert.sendspace=65636
net.inet6.ip6.forwarding=0
net.inet6.ip6.redirect=1
net.inet6.ip6.hlim=64
net.inet6.ip6.mrtproto=0
net.inet6.ip6.maxfragpackets=200
net.inet6.ip6.log_interval=5
net.inet6.ip6.hdrnestlimit=10
net.inet6.ip6.dad_count=1
net.inet6.ip6.auto_flowlabel=1
net.inet6.ip6.defmcasthlim=1
net.inet6.ip6.use_deprecated=1
net.inet6.ip6.maxfrags=200
net.inet6.ip6.mforwarding=0
net.inet6.ip6.multipath=0
net.inet6.ip6.multicast_mtudisc=0
net.inet6.ip6.neighborgcthresh=2048
net.inet6.ip6.maxdynroutes=4096
net.inet6.ip6.dad_pending=0
net.inet6.ip6.mtudisctimeout=600
net.inet6.ip6.ifq.len=0
net.inet6.ip6.ifq.maxlen=2048
net.inet6.ip6.ifq.drops=0
net.inet6.ip6.soiikey=13ab253e5e922c81e30caa0833cc2d79
net.inet6.icmp6.redirtimeout=600
net.inet6.icmp6.nd6_delay=5
net.inet6.icmp6.nd6_umaxtries=3
net.inet6.icmp6.nd6_mmaxtries=3
net.inet6.icmp6.errppslimit=100
net.inet6.icmp6.nd6_maxnudhint=0
net.inet6.icmp6.mtudisc_hiwat=1280
net.inet6.icmp6.mtudisc_lowat=256
net.inet6.icmp6.nd6_debug=0
net.inet6.divert.recvspace=65636
net.inet6.divert.sendspace=65636
net.bpf.bufsize=32768
net.bpf.maxbufsize=2097152
net.mpls.ttl=255
net.mpls.maxloop_inkernel=16
net.mpls.mapttl_ip=1
net.mpls.mapttl_ip6=0
net.pipex.enable=0
net.pipex.inq.len=0
net.pipex.inq.maxlen=256
net.pipex.inq.drops=0
net.pipex.outq.len=0
net.pipex.outq.maxlen=256
net.pipex.outq.drops=0
hw.machine=amd64
hw.model=AMD GX-412TC SOC
hw.ncpu=4
hw.byteorder=1234
hw.pagesize=4096
hw.disknames=sd0:b67df39b2ae48f20
hw.diskcount=1
hw.sensors.km0.temp0=48.88 degC
hw.cpuspeed=998
hw.setperf=100
hw.vendor=PC Engines
hw.product=apu2
hw.version=1.0
hw.serialno=1076574
hw.physmem=1996279808
hw.usermem=1996267520
hw.ncpufound=4
hw.allowpowerdown=1
hw.perfpolicy=manual
hw.smt=0
hw.ncpuonline=4
machdep.console_device=tty00
machdep.bios.diskinfo.128=bootdev = 0xa0000204, cylinders = 1023, heads
= 255, sectors = 63
machdep.bios.cksumlen=1
machdep.allowaperture=0
machdep.cpuvendor=AuthenticAMD
machdep.cpuid=0x730f01
machdep.cpufeature=0x179bfbff
machdep.kbdreset=0
machdep.xcrypt=0
machdep.lidaction=1
machdep.forceukbd=0
machdep.tscfreq=998131218
machdep.invarianttsc=1
ddb.radix=16
ddb.max_width=80
ddb.max_line=24
ddb.tab_stop_width=8
ddb.panic=1
ddb.console=0
ddb.log=1
ddb.trigger=0
vfs.mounts.ffs has 9 mounted instances
vfs.ffs.max_softdeps=23704
vfs.ffs.sd_tickdelay=2
vfs.ffs.sd_worklist_push=0
vfs.ffs.sd_blk_limit_push=0
vfs.ffs.sd_ino_limit_push=0
vfs.ffs.sd_blk_limit_hit=0
vfs.ffs.sd_ino_limit_hit=0
vfs.ffs.sd_sync_limit_hit=0
vfs.ffs.sd_indir_blk_ptrs=0
vfs.ffs.sd_inode_bitmap=0
vfs.ffs.sd_direct_blk_ptrs=0
vfs.ffs.sd_dir_entry=0
vfs.ffs.dirhash_dirsize=2560
vfs.ffs.dirhash_maxmem=2097152
vfs.ffs.dirhash_mem=225588
vfs.nfs.iothreads=-1
vfs.fuse.fusefs_open_devices=0
vfs.fuse.fusefs_fbufs_in=0
vfs.fuse.fusefs_fbufs_wait=0
vfs.fuse.fusefs_pool_pages=0

On Thu, 2018-10-04 at 05:02 +0100, Tom Smyth wrote:

> can you show us a copy of your sysctl output?
>
> check if smt is disabled ...  (Hyper Threading )
>
> Im not sure if this would have an effect on the
> APU2C2 ...  but worth checking as it is a change
> in behaviour between 6.3 and current AFIK
>
> Thanks
>
> Tom Smyth
> On Thu, 4 Oct 2018 at 04:58, Benjamin Petit <[hidden email]> wrote:
> > Ok so I compared 6.3-release, 6.3-release+syspatches(=stable?) and
> > the latest snapshot from October 2.
> >
> > I measured iperf3 throughput between A and B, like this:
> > PC A <---> APU2 <---> PC B
> >
> > pf rules are the one shipped by default in 6.3:
> >
> >   gw# pfctl -sr
> >   block return all
> >   pass all flags S/SA
> >   block return in on ! lo0 proto tcp from any to any port 6000:6010
> >   block return out log proto tcp all user = 55
> >   block return out log proto udp all user = 55
> >
> > OpenBSD 6.3 RELEASE:
> >   - pf enabled:  841 Mbits/sec
> >   - pf disabled: 935 Mbits/sec
> >
> > OpenBSD 6.3 + Syspatch:
> >   - pf enabled:  803 Mbits/sec
> >   - pf disabled: 936 Mbits/sec
> >
> > OpenBSD CURRENT:
> >   - pf enabled: 526 Mbits/sec (541 with kern.pool_debug=0)
> >   - pf disabled: 934 Mbits/sec
> >
> > So there is a small perf drop when applying all syspatches to 6.3
> > (not sure which one cause the drop),
> > but the performance drop SIGNIFICANTLY using the latest snapshot.
> >
> > Am I missing something? (I really hope I am)
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Hrvoje Popovski
In reply to this post by Benjamin Petit
On 4.10.2018. 5:58, Benjamin Petit wrote:

> Ok so I compared 6.3-release, 6.3-release+syspatches(=stable?) and the latest snapshot from October 2.
>
> I measured iperf3 throughput between A and B, like this:
> PC A <---> APU2 <---> PC B
>
> pf rules are the one shipped by default in 6.3:
>
>   gw# pfctl -sr                                                                  
>   block return all
>   pass all flags S/SA
>   block return in on ! lo0 proto tcp from any to any port 6000:6010
>   block return out log proto tcp all user = 55
>   block return out log proto udp all user = 55
>
> OpenBSD 6.3 RELEASE:  
>   - pf enabled:  841 Mbits/sec 
>   - pf disabled: 935 Mbits/sec
>
> OpenBSD 6.3 + Syspatch:
>   - pf enabled:  803 Mbits/sec
>   - pf disabled: 936 Mbits/sec
>
> OpenBSD CURRENT:
>   - pf enabled: 526 Mbits/sec (541 with kern.pool_debug=0)
>   - pf disabled: 934 Mbits/sec
>
> So there is a small perf drop when applying all syspatches to 6.3 (not sure which one cause the drop),
> but the performance drop SIGNIFICANTLY using the latest snapshot.
>
> Am I missing something? (I really hope I am)
>

Hi,

if you're feeling brave enough and you can test/experiment
with pf you can download openbsd kernel with experimental MP support
from here http://kosjenka.srce.hr/~hrvoje/zaprocvat/smpfbsd

SHA256 (smpfbsd) =
e95e94190a0e52de7690b3278cfab14985817089e7a53615cd2599420593b32c

this kernel is compiled with option WITH_PF_LOCK and NET_TASKQ=4

before you download it please backup your active kernel so if something
goes wrong you can put it back ..

cp /bsd /goodbsd
cp smpfbsd /bsd
reboot

if something goes wrong at boot prompt before kernel starts to boot you
can boot old kernel with command - boot goodbsd

i'm running this kernel for few days and i'm hitting pf, pfsync and
pflow quite hard and it seems stable :)

Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Stuart Henderson
In reply to this post by Benjamin Petit
On 2018-10-04, Benjamin Petit <[hidden email]> wrote:
> I don't think the APU2 uses HT

correct


Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Benjamin Petit
In reply to this post by Hrvoje Popovski
I am very brave indeed :)
                                                           
  OpenBSD 6.4 (GENERIC.MP) #0: Wed Oct  3 13:49:29 CEST 2018
      [hidden email]:/sys/arch/amd64/compile/GENERIC.MP
  real mem = 1996279808 (1903MB)
  avail mem = 1926565888 (1837MB)
  mpath0 at root
  scsibus0 at mpath0: 256 targets
  mainbus0 at root
  bios0 at mainbus0: SMBIOS rev. 2.7 @ 0x77fd7020 (7 entries)
  bios0: vendor coreboot version "v4.0.19" date 20180902
  bios0: PC Engines apu2

But I see even worst performance now: 458 Mbits/sec

 
On Thu, 2018-10-04 at 22:26 +0200, Hrvoje Popovski wrote:

> On 4.10.2018. 5:58, Benjamin Petit wrote:
> > Ok so I compared 6.3-release, 6.3-release+syspatches(=stable?) and
> > the latest snapshot from October 2.
> >
> > I measured iperf3 throughput between A and B, like this:
> > PC A <---> APU2 <---> PC B
> >
> > pf rules are the one shipped by default in 6.3:
> >
> >   gw# pfctl
> > -sr                                                                
> >  
> >   block return all
> >   pass all flags S/SA
> >   block return in on ! lo0 proto tcp from any to any port 6000:6010
> >   block return out log proto tcp all user = 55
> >   block return out log proto udp all user = 55
> >
> > OpenBSD 6.3 RELEASE:  
> >   - pf enabled:  841 Mbits/sec  
> >   - pf disabled: 935 Mbits/sec
> >
> > OpenBSD 6.3 + Syspatch:
> >   - pf enabled:  803 Mbits/sec
> >   - pf disabled: 936 Mbits/sec
> >
> > OpenBSD CURRENT:
> >   - pf enabled: 526 Mbits/sec (541 with kern.pool_debug=0)
> >   - pf disabled: 934 Mbits/sec
> >
> > So there is a small perf drop when applying all syspatches to 6.3
> > (not sure which one cause the drop),
> > but the performance drop SIGNIFICANTLY using the latest snapshot.
> >
> > Am I missing something? (I really hope I am)
> >
>
> Hi,
>
> if you're feeling brave enough and you can test/experiment
> with pf you can download openbsd kernel with experimental MP support
> from here http://kosjenka.srce.hr/~hrvoje/zaprocvat/smpfbsd
>
> SHA256 (smpfbsd) =
> e95e94190a0e52de7690b3278cfab14985817089e7a53615cd2599420593b32c
>
> this kernel is compiled with option WITH_PF_LOCK and NET_TASKQ=4
>
> before you download it please backup your active kernel so if
> something
> goes wrong you can put it back ..
>
> cp /bsd /goodbsd
> cp smpfbsd /bsd
> reboot
>
> if something goes wrong at boot prompt before kernel starts to boot
> you
> can boot old kernel with command - boot goodbsd
>
> i'm running this kernel for few days and i'm hitting pf, pfsync and
> pflow quite hard and it seems stable :)
>

Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Benjamin Petit
Not sure what do do know. Should I open a bug for more visibility?

To be honest, my WAN connection is way lower than the max
measured here with CURRENT, but I don´t want to discover when
upgrading to 6.5 that I lost 40% percent of performance again.

I would be more than happy to help with the investigations
(what/where to look, what setting to  play with).

Otherwise I will have to switch to another OS, and I would rather
not. (Simple NAT rules with FreeBSD 11.2: ~890Mbits/s, with
OpenWRT ~950Mbits/s)

I am also surprised to see that using the APU2 as an iperf3 client
cannot max a gigabit connection (without pf involved).

I get that performance is not the main focus on OpenBSD, but
this regression is kind of scary to me.

Thanks,

On Thu, 04 Oct 2018 17:33:37 -0700
Benjamin Petit <[hidden email]> wrote:

> I am very brave indeed :)
>                                                            
>   OpenBSD 6.4 (GENERIC.MP) #0: Wed Oct  3 13:49:29 CEST 2018
>       [hidden email]:/sys/arch/amd64/compile/GENERIC.MP
>   real mem = 1996279808 (1903MB)
>   avail mem = 1926565888 (1837MB)
>   mpath0 at root
>   scsibus0 at mpath0: 256 targets
>   mainbus0 at root
>   bios0 at mainbus0: SMBIOS rev. 2.7 @ 0x77fd7020 (7 entries)
>   bios0: vendor coreboot version "v4.0.19" date 20180902
>   bios0: PC Engines apu2
>
> But I see even worst performance now: 458 Mbits/sec
>
>  
> On Thu, 2018-10-04 at 22:26 +0200, Hrvoje Popovski wrote:
> > On 4.10.2018. 5:58, Benjamin Petit wrote:
> > > Ok so I compared 6.3-release, 6.3-release+syspatches(=stable?) and
> > > the latest snapshot from October 2.
> > >
> > > I measured iperf3 throughput between A and B, like this:
> > > PC A <---> APU2 <---> PC B
> > >
> > > pf rules are the one shipped by default in 6.3:
> > >
> > >   gw# pfctl
> > > -sr                                                                
> > >  
> > >   block return all
> > >   pass all flags S/SA
> > >   block return in on ! lo0 proto tcp from any to any port 6000:6010
> > >   block return out log proto tcp all user = 55
> > >   block return out log proto udp all user = 55
> > >
> > > OpenBSD 6.3 RELEASE:  
> > >   - pf enabled:  841 Mbits/sec  
> > >   - pf disabled: 935 Mbits/sec
> > >
> > > OpenBSD 6.3 + Syspatch:
> > >   - pf enabled:  803 Mbits/sec
> > >   - pf disabled: 936 Mbits/sec
> > >
> > > OpenBSD CURRENT:
> > >   - pf enabled: 526 Mbits/sec (541 with kern.pool_debug=0)
> > >   - pf disabled: 934 Mbits/sec
> > >
> > > So there is a small perf drop when applying all syspatches to 6.3
> > > (not sure which one cause the drop),
> > > but the performance drop SIGNIFICANTLY using the latest snapshot.
> > >
> > > Am I missing something? (I really hope I am)
> > >
> >
> > Hi,
> >
> > if you're feeling brave enough and you can test/experiment
> > with pf you can download openbsd kernel with experimental MP support
> > from here http://kosjenka.srce.hr/~hrvoje/zaprocvat/smpfbsd
> >
> > SHA256 (smpfbsd) =
> > e95e94190a0e52de7690b3278cfab14985817089e7a53615cd2599420593b32c
> >
> > this kernel is compiled with option WITH_PF_LOCK and NET_TASKQ=4
> >
> > before you download it please backup your active kernel so if
> > something
> > goes wrong you can put it back ..
> >
> > cp /bsd /goodbsd
> > cp smpfbsd /bsd
> > reboot
> >
> > if something goes wrong at boot prompt before kernel starts to boot
> > you
> > can boot old kernel with command - boot goodbsd
> >
> > i'm running this kernel for few days and i'm hitting pf, pfsync and
> > pflow quite hard and it seems stable :)
> >
>


--
Benjamin Petit <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Performance impact of PF on APU2

Chris Cappuccio
In reply to this post by Hrvoje Popovski
Hrvoje Popovski [[hidden email]] wrote:
> if you're feeling brave enough and you can test/experiment
> with pf you can download openbsd kernel with experimental MP support
> from here http://kosjenka.srce.hr/~hrvoje/zaprocvat/smpfbsd
>
> SHA256 (smpfbsd) =
> e95e94190a0e52de7690b3278cfab14985817089e7a53615cd2599420593b32c
>
> this kernel is compiled with option WITH_PF_LOCK and NET_TASKQ=4
>

Did you do "option NET_TASKQ=4"? Because there is no #ifdef NET_TASKQ,
so you have to edit /usr/src/sys/net/if.c directly if you didn't already.

Chris