Quad ethernet card

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Dave Harrison-3
Henning Brauer wrote:

> * Ronnie Garcia <[hidden email]> [2007-06-06 13:04]:
>> Henning Brauer a icrit :
>>> * nate <[hidden email]> [2007-06-05 21:44]:
>>>> I built 3 OpenBSD 3.6(?) servers in mid 2005 with these cards, and
>>>> was able to get a peak throughput of about 520Mbps in bridged mode
>>>> (pf disabled) measured using iperf.
>>> the single-stream tcp test iperf uses is pretty meaningless
>>> (unless.. well, that's another story)
>> What other tool would you recommend, then ?
>
> they all suck.
>
> best "simulation" is recording your real-world traffic using tcpdump and
> then use tcpreplay. but that is tricky too.

Well if you're interested in working out a vaguely real benchmark for
the throughput of your appliance I recommend you choose a type of
traffic and focus on it.  So perhaps that's HTTP, SMTP or some other
obvious protocol.  Pick a diverse corpus of files or emails to handle,
then pass the traffic through the host and see how you go.

If you're just looking for a big number, open a single TCP session and
send alot of traffic through it so you don't have to continually start
new sessions (sessions are comparatively expensive).

Henning has something in saying that most of the tools aren't great,
in the end all benchmarks are artificial in some measure.  Replaying
traffic is equally artificial as it's only indicative of the traffic
you recorded - which is likely to be biased towards whatever was
happening at the time on your LAN.

When all's said and done, benchmark for the traffic you "expect" and
work from there.

HTH
Dave

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Matt Rowley
> > best "simulation" is recording your real-world traffic using tcpdump and
> > then use tcpreplay. but that is tricky too.
>
> Henning has something in saying that most of the tools aren't great,
> in the end all benchmarks are artificial in some measure.  Replaying
> traffic is equally artificial as it's only indicative of the traffic
> you recorded - which is likely to be biased towards whatever was
> happening at the time on your LAN.

Also worth noting is that if you're generating traffic from a single host,
you're bound by the interrupt rates that host is capable of.  Generate
traffic from multiple sources if you really want to gauge high load.

--Matt

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Dave Harrison-3
Matt Rowley wrote:

>>> best "simulation" is recording your real-world traffic using tcpdump and
>>> then use tcpreplay. but that is tricky too.
>> Henning has something in saying that most of the tools aren't great,
>> in the end all benchmarks are artificial in some measure.  Replaying
>> traffic is equally artificial as it's only indicative of the traffic
>> you recorded - which is likely to be biased towards whatever was
>> happening at the time on your LAN.
>
> Also worth noting is that if you're generating traffic from a single host,
> you're bound by the interrupt rates that host is capable of.  Generate
> traffic from multiple sources if you really want to gauge high load.

Definitely.  My personal experience is that an e1000 tops out at about
~820-850 Mb/s of raw throughput - i.e. on a single TCP session.

Other things that may get in the way of Truly Awesome Throughput (TM)
include things like socket timeouts on either client or server host,
and file descriptors ; note that those only come into play when you're
trying to simulate a web server or the like.

However I'm not aware of any tools that handle that kind of
distributed benchmark.. anyone ?

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Henning Brauer
In reply to this post by Dave Harrison-3
* Dave Harrison <[hidden email]> [2007-06-06 13:52]:
> If you're just looking for a big number, open a single TCP session and
> send alot of traffic through it so you don't have to continually start
> new sessions (sessions are comparatively expensive).

single tcp session benches are completely meaningless and will not max
out any device faster than a moose fart

--
Henning Brauer, [hidden email], [hidden email]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Jacob Yocom-Piatt-2
Henning Brauer wrote:
> single tcp session benches are completely meaningless and will not max
> out any device faster than a moose fart
>
>  

was unaware that moose farts were slow. you learn something new every day :)

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Diana Eichert
On Wed, 6 Jun 2007, Jacob Yocom-Piatt wrote:

> was unaware that moose farts were slow. you learn something new every day :)

i believe the speed of moose farts varies in relationship to the moose,
meese?, distance from Calgary.

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

nate-31
In reply to this post by Henning Brauer
Henning Brauer wrote:
> * nate <[hidden email]> [2007-06-05 21:44]:
>> I built 3 OpenBSD 3.6(?) servers in mid 2005 with these cards, and
>> was able to get a peak throughput of about 520Mbps in bridged mode
>> (pf disabled) measured using iperf.
>
> the single-stream tcp test iperf uses is pretty meaningless
> (unless.. well, that's another story)
>
>> Interrupt cpu time was ~30%, the rest of the cpu was idle.

hmm, well I would expect this would provide a maximum number for
throughput because there's only 1 connection, no extra processing
vs multiple connections, not that multiple connections should
matter since it was a bridge, and pf was disabled for the test.

It doesn't make sense to me why more connections would increase
throughput, can you(or someone) explain why this would be the
case.

I also would expect that this maximum number likely would not
be achieved once pf is enabled and 'real world' traffic was flowing
through the system keeping track of thousands of states from
the ~400 hosts on both sides of the firewall. But at least it would
give me a number, if I saw the same interrupt cpu% I could reasonably
expect the box to be maxxed out. Fortunately normal network
traffic was quite low, the biggest users of bandwidth were file
copies via scp/rsync.

Someone replied to my original post off-list and told me about a
bug that was fixed in 2006 in the Intel GigE network driver that
reduces the amount of pci hits per packet thus increasing throughput
and packets per second, which may have contributed to the performance
issue I experienced(again in mid 2005). Of course at the time I
partipated in a thread very similar to this and I don't recall
anyone responding with their openbsd network performance, so I
had nothing to base it on(were the numbers normal? low ? high?).
The FAQ says it's dependent on the system, and I purchased the
fastest 32-bit CPU that was on the market at the time(64-bit
was still too new I think that was (one of) the first releases
to support 64-bit x86), and OpenBSD SMP crashed on all machines
I tested at the time during boot). Even now I think I've gotten
one response(may of been off-list) saying they get less than
500Mbit on their card(forgot which card off hand, not the Intel
one though).

So regardless of the performance I think it was about as fast as
it was going to get, at the time. Short of absurdly low numbers
(under 200Mbit, which I would of purchased a fully hardware
firewall, we had just purchased 3000 gigabit switch ports so we
were spending a bit), I was going to stick with OpenBSD because
pf is a great tool, and easy to use, and the hardware was a good
price too with hardware raid, triple redundant power supplies
(each on a seperate UPS-backed circuit), hot swap fans etc.

In the end the firewalls seemed to work out well, it's been
2 years since they launched and they haven't had a problem,
fortunately network traffic is fairly low. Two firewalls are
in active use(for different network segments, and are
failover for each other's network segments), with a 3rd
cold standby server.

tcpreplay sounds like an interesting tool, I had not heard
about it until your post.

nate

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Ted Bullock
In reply to this post by Dave Harrison-3
Dave Harrison wrote:
> However I'm not aware of any tools that handle that kind of
> distributed benchmark.. anyone ?
>

httperf can be run in an array of clients (--client option), although
there is currently no way to automatically aggregate the results.



--
Theodore Bullock, <[hidden email], [hidden email]>
B.Sc Software Engineering
Bike Across Canada Adventure http://www.comlore.com/bike

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Henning Brauer
In reply to this post by nate-31
* nate <[hidden email]> [2007-06-06 17:52]:

> Henning Brauer wrote:
> > * nate <[hidden email]> [2007-06-05 21:44]:
> >> I built 3 OpenBSD 3.6(?) servers in mid 2005 with these cards, and
> >> was able to get a peak throughput of about 520Mbps in bridged mode
> >> (pf disabled) measured using iperf.
> >
> > the single-stream tcp test iperf uses is pretty meaningless
> > (unless.. well, that's another story)
> >
> >> Interrupt cpu time was ~30%, the rest of the cpu was idle.
>
> hmm, well I would expect this would provide a maximum number for
> throughput because there's only 1 connection, no extra processing
> vs multiple connections, not that multiple connections should
> matter since it was a bridge, and pf was disabled for the test.
>
> It doesn't make sense to me why more connections would increase
> throughput, can you(or someone) explain why this would be the
> case.

please go read up on tcp and the interactions between delay, window
size, bandwidth etc.

> I tested at the time during boot). Even now I think I've gotten
> one response(may of been off-list) saying they get less than
> 500Mbit on their card(forgot which card off hand, not the Intel
> one though).

i have a customer where we route about 800 MBit/s of real world traffic.

--
Henning Brauer, [hidden email], [hidden email]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Theo de Raadt
In reply to this post by Dave Harrison-3
> Henning has something in saying that most of the tools aren't great,
> in the end all benchmarks are artificial in some measure.  Replaying
> traffic is equally artificial as it's only indicative of the traffic
> you recorded - which is likely to be biased towards whatever was
> happening at the time on your LAN.

henning is trying to make the network layer and pf -- on balance --
manage all types of traffic faster.

therefore it does not matter if the traffic is artificial or not, as
long as it isn't skewed towards unrealistic.

he's not working in the same area at all as you guys trying to make
your web servers serve a few more pages.

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Darren S.
In reply to this post by Ronnie Garcia
On 6/6/07, Ronnie Garcia <[hidden email]> wrote:

> Henning Brauer a icrit :
> > * nate <[hidden email]> [2007-06-05 21:44]:
> >> I built 3 OpenBSD 3.6(?) servers in mid 2005 with these cards, and
> >> was able to get a peak throughput of about 520Mbps in bridged mode
> >> (pf disabled) measured using iperf.
> >
> > the single-stream tcp test iperf uses is pretty meaningless
> > (unless.. well, that's another story)
>
> What other tool would you recommend, then ? The idea is to simulate
> legit Internet traffic and/or DDoS traffic.

net/netrate (from FreeBSD) was just committed as a port. Might be useful.

http://www.undeadly.org/cgi?action=article&sid=20070603040549&mode=expanded
http://ports.openbsd.nu/net/netrate

DS

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Chris Cappuccio
In reply to this post by Claudio Jeker
Claudio Jeker [[hidden email]] wrote:
>
> Sis(4) is plaing in the same league as rl(4). It works fine and I never
> had porblems with it but I would never use it in a router with high
> performance needs.

No, the interface on sis is not as bad as the old rl chips.  Also, the 83816
sis supports interrupt hold-off (like newer fxp).  With recent openbsd changes,
(such as only running nanotime() on interrupt instead of per-packet) the
interrupt hold off feature now makes a significant difference under openbsd.

Older 83815 chips had serious problems with different cable lengths that the
driver tries to work around, and it took many revisions of work-arounds before
we even got it right.  None of those problems exist in the 83816, thankfully.
This bush-league cable shit is part of what gives the sis chip a bad
reputation.  Also I don't think the 83816 has a hw checksum feature.

Off topic, newer if_vr chips (like 6105M used in new beta boards from Soekris
and PCEngines) now have hw checksum, and they also fix the main performance
problems with the older design which required multiple memory copies.  So,
we need to import driver enhancements from freebsd to take advantage of it.

One thing vr doesn't support is an interrupt hold off, but vr may have another
way to get a similar effect.  Unfortunately VIA has changed their documentation
policy for the worse over the past 5+ years, so its harder to get docs.  

Chris

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Fredrik Carlsson
In reply to this post by Fredrik Carlsson
Hi,

Thanks for all the input, we have decided to go for a Dell PE860 with an
Intel quad pro/1000 GT adapter.

Best regards
Fredrik Carlsson

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Henning Brauer
In reply to this post by Chris Cappuccio
* Chris Cappuccio <[hidden email]> [2007-06-06 21:17]:

> Claudio Jeker [[hidden email]] wrote:
> >
> > Sis(4) is plaing in the same league as rl(4). It works fine and I never
> > had porblems with it but I would never use it in a router with high
> > performance needs.
>
> No, the interface on sis is not as bad as the old rl chips.  Also, the 83816
> sis supports interrupt hold-off (like newer fxp).  With recent openbsd changes,
> (such as only running nanotime() on interrupt instead of per-packet) the
> interrupt hold off feature now makes a significant difference under openbsd.

int mitigation has always made quite some difference, but now it is even
more, I agree.

nontheless sis is not far up from rl. far away from the real ones.

> Off topic, newer if_vr chips (like 6105M used in new beta boards from Soekris
> and PCEngines) now have hw checksum, and they also fix the main performance
> problems with the older design which required multiple memory copies.  So,
> we need to import driver enhancements from freebsd to take advantage of it.
>
> One thing vr doesn't support is an interrupt hold off

which disqualifies it for any serious routing tasks.

--
Henning Brauer, [hidden email], [hidden email]
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting - Hamburg & Amsterdam

Reply | Threaded
Open this post in threaded view
|

Re: Quad ethernet card

Chris Cappuccio
Henning Brauer [[hidden email]] wrote:
>
> int mitigation has always made quite some difference, but now it is even
> more, I agree.
>

I could never see a difference on Soekris boxes with a 400 us delay in if_sis
in earlier OpenBSD versions.  But I never tried higher delays than that.

> nontheless sis is not far up from rl. far away from the real ones.
>

I'd love to believe you, but tell me what you base this assertion on.

Is this with real world comparisons of the same hardware running sis
and rl (and a "real" chip like fxp...) ?  if_rl does m_copydata() and
bzero() all the time, where if_sis does not have to.  The equivalent
routine in if_sis (sis_encap) is about half the size.  The whole sis driver
is very simple and doesn't appear to do anything outrageous to support
the chip.

Is the bzero call where if_rl pads the frames in rl_encap a candidate to
move to memset which gcc can better optimize?  Hmm, it's only called for small
packets that need to be padded to the minimum size for if_rl.

> > One thing vr doesn't support is an interrupt hold off
>
> which disqualifies it for any serious routing tasks.
>

There may be another way to do this with if_vr.

12