slightly OT: TCP checksum and RFC conformity

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

slightly OT: TCP checksum and RFC conformity

Andreas Bartelt-2
Hi all,

I was wondering why such a simple checksum algorithm is implemented in
TCP. I suppose, it's because of the slow CPU performance many years ago.
This algorithm looks so unreliable to me that it even can't protect
against some pretty simple errors, which (I suppose) also could occur
randomly (but obviously very seldomly in practice).

In RFC 1122 I've read that the TCP checksum MUST (the usual caps lock
problem...) be implemented:
...
4.2.2.7 TCP Checksum: RFC-793 Section 3.1

     Unlike the UDP checksum (see Section 4.1.3.4), the TCP checksum is
never optional. The sender MUST generate it and the receiver MUST check it.
...

So I'm wondering why it MUST be calulated:
is it necessary to implement a checksum in TCP because reliability at
layer 2 is insufficient in practice? I see only two possible answers to
this question:
1) yes - than it's a very old reliability bug and should be fixed,
because sooner or later the TCP checksum won't catch a random error
pattern in a segment. (should it be fixed by always using an alternate
TCP checksum option, i.e. a MD5 hash? Or by improving layer 2
reliability in hardware?) [btw, netstat -sp tcp shows me that there
sometimes are TCP checksum errors - 23 errors in 9 days on a slow DSL link]
2) no - so why not skip TCP checksum calculation at all? (at least for
incoming seqments this wouldn't break a thing besides the RFC itself).

I know that some new NICs do checksum calculation in hardware for
performance reasons, but this has nothing to do with the actual problem
(if there even is a necessity to calculate a checksum at transport layer).

Please correct me if my assumptions or conclusions are wrong.

regards,
Andreas

Reply | Threaded
Open this post in threaded view
|

Re: slightly OT: TCP checksum and RFC conformity

Ted Unangst-2
On 11/16/05, Andreas Bartelt <[hidden email]> wrote:
> I was wondering why such a simple checksum algorithm is implemented in
> TCP. I suppose, it's because of the slow CPU performance many years ago.

and that's the way the great tcp gods of old said it must be.

> In RFC 1122 I've read that the TCP checksum MUST (the usual caps lock
> problem...) be implemented:

i'm not sure if you're serious or not, but MUST has a particular meaning.

> So I'm wondering why it MUST be calulated:
> is it necessary to implement a checksum in TCP because reliability at
> layer 2 is insufficient in practice? I see only two possible answers to
> this question:
> 1) yes - than it's a very old reliability bug and should be fixed,
> because sooner or later the TCP checksum won't catch a random error
> pattern in a segment. (should it be fixed by always using an alternate
> TCP checksum option, i.e. a MD5 hash? Or by improving layer 2
> reliability in hardware?) [btw, netstat -sp tcp shows me that there
> sometimes are TCP checksum errors - 23 errors in 9 days on a slow DSL link]

good luck communicating with other tcp devices after you change your
checksum to md5.  the point is to be fast and catch some errors.
also, type end-to-end into google.

> 2) no - so why not skip TCP checksum calculation at all? (at least for
> incoming seqments this wouldn't break a thing besides the RFC itself).

because then you don't detect errors.

Reply | Threaded
Open this post in threaded view
|

Re: slightly OT: TCP checksum and RFC conformity

Christian Weisgerber
In reply to this post by Andreas Bartelt-2
Andreas Bartelt <[hidden email]> wrote:

> I was wondering why such a simple checksum algorithm is implemented in
> TCP. I suppose, it's because of the slow CPU performance many years ago.
> This algorithm looks so unreliable to me that it even can't protect
> against some pretty simple errors, which (I suppose) also could occur

The idea is to protect against lost, duplicated, or reordered
packets.  If the underlying medium suffers from bit errors, the
link layer should handle those.  For example, Ethernet frames include
a 32-bit CRC, which makes it possible to recognize and discard
corrupted packets.

--
Christian "naddy" Weisgerber                          [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: slightly OT: TCP checksum and RFC conformity

Andreas Bartelt-2
In reply to this post by Ted Unangst-2
Hi,

Ted Unangst wrote:
...
> good luck communicating with other tcp devices after you change your
> checksum to md5.  the point is to be fast and catch some errors.
> also, type end-to-end into google.
>

thanks for the interesting paper. I now understand why it makes sense to
use a checksum at link layer which catches only "most" errors, because
not all applications require full protection against random errors. I
also understand that error detection/error correction is always a
performance tradeoff, which also depends on the reliability requirements
and the latency of the connection.

As you know, TCP has been adapted to changing requirements in the past
via TCP options, which also provide a fallback mechanism. RFC 1146 is
about alternate TCP checksums (I don't know how good they are), but I've
found no clues about actual implementations of them. Please tell me, did
I just search at the wrong places?

>>2) no - so why not skip TCP checksum calculation at all? (at least for
>>incoming seqments this wouldn't break a thing besides the RFC itself).
>
>
> because then you don't detect errors.
>

That's exactly my point. My basic assumption was that the TCP checksum
doesn't provide enough protection against random errors. By googling for
'crc tcp checksum disagree' I've found a paper which seems to confirm this.

The tcp(4) man page says "The TCP protocol provides a reliable,
flow-controlled, two-way transmission of data." It doesn't say "The TCP
protocol provides a reliable, ..., only if shit doesn't happen".

As much better algorithms for error detection are known and PC
performance (and also Internet traffic) has increased a lot since the
introduction of TCP - do you think that the original checksum algorithm
is still the best choice in terms of a reliability/performance tradeoff?

regards,
Andreas

Reply | Threaded
Open this post in threaded view
|

Re: slightly OT: TCP checksum and RFC conformity

chefren
On 11/17/05 00:39, Andreas Bartelt wrote:

> As much better algorithms for error detection are known

What's better? Can those algorithms run with only a few hardware gates at 10Gbit
speeds too?

 > and PC  performance (and also Internet traffic) has increased a lot since the
> introduction of TCP

And "internet speed", didn't that increase too? Don't you think there is some
balance there?

 > - do you think that the original checksum algorithm
> is still the best choice in terms of a reliability/performance tradeoff?

It's good enough and eh, "compatible", it's clueless to try to develop an
incompatible version of TCP, that won't be TCP but something else.

+++chefren

Reply | Threaded
Open this post in threaded view
|

Re: slightly OT: TCP checksum and RFC conformity

Damien Miller
In reply to this post by Andreas Bartelt-2
On Thu, 17 Nov 2005, Andreas Bartelt wrote:

> As much better algorithms for error detection are known and PC performance
> (and also Internet traffic) has increased a lot since the introduction of TCP
> - do you think that the original checksum algorithm is still the best choice
> in terms of a reliability/performance tradeoff?

If you care about errors creeping in from the link-layer, then you can run
IPsec AH. Most people don't care, because their link layers are pretty
good. People with bad link layers tend to implement decent error detection
and correction there.

E.g.

[djm@fuyu djm]$ netstat -sp ip | grep -E '(bad.*checksum|total packets)'
         61092730 total packets received
         0 bad header checksums

Given that a) stronger mechanisms exist if you want to use them, b) this
isn't a problem in real life and c) OpenBSD isn't going to make unilateral
TCP changes that break its ability to speak to everyone else on the
Internet, you should probably find a different windmill to attack :)

-d

Reply | Threaded
Open this post in threaded view
|

Re: slightly OT: TCP checksum and RFC conformity

Tobias Weingartner-2
In reply to this post by Andreas Bartelt-2
On Thursday, November 17, Andreas Bartelt wrote:
>
> As much better algorithms for error detection are known and PC
> performance (and also Internet traffic) has increased a lot since the
> introduction of TCP - do you think that the original checksum algorithm
> is still the best choice in terms of a reliability/performance tradeoff?

Nope, it is not.  But that's the reason it's called a "standard".  You
get some good, and some bad with them.  Welcome to the real world...

--Toby.

Reply | Threaded
Open this post in threaded view
|

Re: slightly OT: TCP checksum and RFC conformity

Andreas Bartelt-2
In reply to this post by Damien Miller
Hi,

Damien Miller wrote:
...
> [djm@fuyu djm]$ netstat -sp ip | grep -E '(bad.*checksum|total packets)'
>         61092730 total packets received
>         0 bad header checksums
>

wouldn't netstat -sp tcp | grep -E '(bad.*checksum|total packets)' give
the output of interest?

(uptime 10 days on my slow ADSL link)
netstat -sp ip | grep -E '(bad.*checksum|total packets)'
         2448320 total packets received
         0 bad header checksums
netstat -sp tcp | grep -E '(bad.*checksum|total packets)'
                 23 discarded for bad checksums
                 0 bad/missing md5 checksums

Doesn't this mean that 23 errors were not detected by the link layer
(probably because the errors were introduced some hops away from me) and
only the TCP checksum catched them?

I hope you're right and it's not a reliability problem in practice.

regards,
Andreas

Reply | Threaded
Open this post in threaded view
|

Re: slightly OT: TCP checksum and RFC conformity

Andreas Bartelt-2
In reply to this post by Tobias Weingartner-2
Hi,

Tobias Weingartner wrote:

> On Thursday, November 17, Andreas Bartelt wrote:
>
>>As much better algorithms for error detection are known and PC
>>performance (and also Internet traffic) has increased a lot since the
>>introduction of TCP - do you think that the original checksum algorithm
>>is still the best choice in terms of a reliability/performance tradeoff?
>
>
> Nope, it is not.  But that's the reason it's called a "standard".  You
> get some good, and some bad with them.  Welcome to the real world...
>

it's probably my lack of knowledge, but I thought it would be possible
to solve this by a TCP option without breaking interoperability. So this
is actually a design decision which can't be corrected without a TCP
replacement (which, I guess, won't happen in the next years)?

regards,
Andreas

Reply | Threaded
Open this post in threaded view
|

Re: slightly OT: TCP checksum and RFC conformity

Otto Moerbeek
On Thu, 17 Nov 2005, Andreas Bartelt wrote:

> Hi,
>
> Tobias Weingartner wrote:
> > On Thursday, November 17, Andreas Bartelt wrote:
> >
> > > As much better algorithms for error detection are known and PC performance
> > > (and also Internet traffic) has increased a lot since the introduction of
> > > TCP - do you think that the original checksum algorithm is still the best
> > > choice in terms of a reliability/performance tradeoff?
> >
> >
> > Nope, it is not.  But that's the reason it's called a "standard".  You
> > get some good, and some bad with them.  Welcome to the real world...
> >
>
> it's probably my lack of knowledge, but I thought it would be possible to
> solve this by a TCP option without breaking interoperability. So this is
> actually a design decision which can't be corrected without a TCP replacement
> (which, I guess, won't happen in the next years)?

Yes, it could be solved with options. There's even an RFC for it. But
since aparently nobody is implementing it, it is probably not very
interesting.

        -Otto

Reply | Threaded
Open this post in threaded view
|

RE: Re: slightly OT: TCP checksum and RFC conformity

Tony Aberenthy
In reply to this post by Andreas Bartelt-2
[hidden email] wrote:

>Hi,
>
>Damien Miller wrote:
>...
>> [djm@fuyu djm]$ netstat -sp ip | grep -E
>'(bad.*checksum|total packets)'
>>   61092730 total packets received
>>   0 bad header checksums
>>
>
>wouldn't netstat -sp tcp | grep -E
>'(bad.*checksum|total packets)' give
>the output of interest?
>
>(uptime 10 days on my slow ADSL link)
>netstat -sp ip | grep -E '(bad.*checksum|total
>packets)'
>  2448320 total packets received
>  0 bad header checksums
>netstat -sp tcp | grep -E '(bad.*checksum|total
>packets)'
>  23 discarded for bad checksums
>  0 bad/missing md5 checksums
>
>Doesn't this mean that 23 errors were not detected
>by the link layer
>(probably because the errors were introduced some
>hops away from me) and
>only the TCP checksum catched them?
>
>I hope you're right and it's not a reliability
>problem in practice.
>
>regards,
>Andreas

Flames invited if I'm wrong, but I think that it
means that 23 packets were discarded for bad checksums
Those 23 packets were discarded BEFORE being seen by the
next layer up.
Of course that may be just wishful thinking.
One easy stunt would be to generate correct checksums going
out for whatever garbage seems to have been received.
Repeat. Flames invited. Who/what do you trust?