FW: [Bug 243590] TCP ECN not adhering extremely strictly to RFC3168 can cause massive TCP perf issues
after reading through the TCP ECN related code on CVSweb for the respective MAIN, it is clear that OpenBSD and NetBSD (and probably all other BSD variants and derived OS) suffer from the same problem as FreeBSD, when running a transactional TCP session with ECN against a Linux client (where data changes direction frequently, rather than one bulk transfer using only one half connection; e.g. NFS, iSCSI, SMB,...).
Due to all Linux processing CWR only on packets that also contain data, there is a good probability, that when OpenBSD places a CWR on an arbitrary next packet (like it does now), that the CWR is being ignored by Linux, and ECE remains latched. This in turn results in the BSD sender to further shrink the cwnd, until by chance the CWR ends up set on a data segment - which may be at very small cwnd levels, and after a couple of seconds.
The issue is documented in this FreeBSD bug report:
Note that the problem will not show up with "typical" bulk transfer testing, only when data is send alternating between both ends, e.g. NFS request for a large file block, server sending that NFS response, etc...
From: [hidden email] <[hidden email]>
Sent: Donnerstag, 28. Mai 2020 00:35
To: [hidden email] Subject: [Bug 243590] TCP ECN not adhering extremely strictly to RFC3168 can cause massive TCP perf issues
NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.
MFS r361436: MFC r361347: With RFC3168 ECN, CWR SHOULD only be sent with new data.
Overly conservative data receivers may ignore the CWR flag on other
packets, and keep ECE latched. This can result in continuous reduction
of the congestion window, and very poor performance when ECN is
This does NOT contain the merge of the change to RACK since at this
time that code does not exist in stable/11, and there is no plan to
merge RACK to stable/11.