IPsec performance

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

IPsec performance

Vincent Bernat
Hi !

I  have several  questions about  IPsec performance  in OpenBSD.  I am
using IPsec to maintain more than 60 tunnels and it performs well when
those tunnels are idle. Tunnels are  either using 3DES or AES. 3DES is
due  to the  fact that  clients  are using  Windows where  AES is  not
available.

OpenBSD is running on a Celeron 2.4 GHz and openssl speed aes gives 70
MB/s and des-ede3 gives 15 MB/s. With 40 Mb/s (megabits/s) of traffic,
the processor is used at 100%.  Why such a difference with the results
of openssl speed.

I have  added an  Hifn 7955  crypto card. However,  after one  hour of
managing the  60 tunnels,  it becomes impossible  to do  any symmetric
crypto. There is nothing in the dmesg about that. The only solution is
to reboot. With the card disabled,  there is no such problem. Any idea
of why I have this problem ?

What kind of hardware will perform 3DES and AES encryption well ? A C3
processor has AES encryption built-in  but I must keep 3DES encryption
as   well   and   those   processors   are  very   slow   on   general
operations. Would  an Opteron  2.2 Ghz performs  better than  an Intel
EM64T Xeon 3 GHz ?

If  I choose  a multiprocessor  system, will  OpenBSD be  able  to use
efficienly the two processors for doing IPsec stuff ?
--
Write clearly - don't be too clever.
            - The Elements of Programming Style (Kernighan & Plauger)

Reply | Threaded
Open this post in threaded view
|

Re: IPsec performance

J.C. Roberts-2
On Tue, 08 Nov 2005 08:51:06 +0100, Vincent Bernat <[hidden email]>
wrote:

>Hi !
>
>I  have several  questions about  IPsec performance  in OpenBSD.  I am
>using IPsec to maintain more than 60 tunnels and it performs well when
>those tunnels are idle. Tunnels are  either using 3DES or AES. 3DES is
>due  to the  fact that  clients  are using  Windows where  AES is  not
>available.
>
>OpenBSD is running on a Celeron 2.4 GHz and openssl speed aes gives 70
>MB/s and des-ede3 gives 15 MB/s. With 40 Mb/s (megabits/s) of traffic,
>the processor is used at 100%.  Why such a difference with the results
>of openssl speed.
>

Celeron? If your goal is to melt an nearly worthless processor into a
completely worthless chunk of slag, Celeron is perfect.

>I have  added an  Hifn 7955  crypto card. However,  after one  hour of
>managing the  60 tunnels,  it becomes impossible  to do  any symmetric
>crypto. There is nothing in the dmesg about that. The only solution is
>to reboot. With the card disabled,  there is no such problem. Any idea
>of why I have this problem ?
>

Just a wild guess but it possibly has something to do with the fact HiFn
refuses to make their documentation publicly available and because of
this support in OpenBSD is limited. As someone who went to lunch with
the HiFn CEO and VP of engineering over a year ago to address this
problem, the most I can say is I was told a lot of things but nothing
was actually done... The worst part about the experience was the fact
Theo told be it would be all talk and no action before I ever met with
HiFn.

Few things are worse than getting an implied "I told you so" from Theo.
All the same, at least I tried.

>What kind of hardware will perform 3DES and AES encryption well ? A C3
>processor has AES encryption built-in  but I must keep 3DES encryption
>as   well   and   those   processors   are  very   slow   on   general
>operations. Would  an Opteron  2.2 Ghz performs  better than  an Intel
>EM64T Xeon 3 GHz ?
>
>If  I choose  a multiprocessor  system, will  OpenBSD be  able  to use
>efficienly the two processors for doing IPsec stuff ?
>

Now think to yourself on this one. You've got 60 tunnels that must be
serviced by the processor. A single threaded processor with limited
cache and task switching (i.e. Celeron) is the wrong choice if not the
worst choice you could make. The fake multi-core Intel stuff called
"Hyper Threading" is a small step in the right direction. Next up would
be real multi-core processors, and lastly, your best choice is having
multiple multi-core processors.

Having an custom ASIC (processor) specifically designed to do crypto
running as a co-processing slave to your system CPU is a great and
wonderful thing, but only if it actually works. Though it might not
solve your immediate problem, it would be good for the project if you
contacted HiFn yourself and asked them why their documentation is not
publicly available so the open source world can develop drivers.

"Chris Kenber"  ckenber(at)hifn.com  CEO
"Russell Dietz" RDietz(at)hifn.com   VP Eng

If, and only if, your real limitation is actually the processing power
needed for crypto, then obviously having more processing power will most
likely solve the problem. Before you decide the real problem is
processing power, please do yourself a favor and look for other possible
bottlenecks, like interrupt, network, memory... A machine with multiple
general purpose multi-core processor is not cheap (i.e. dual or quad
multi-core Opterons would be sweet). Tossing a general purpose CPU at a
specific processing problem will help but it's better and cheaper to use
custom co-processors, like crypto ASIC's, to address the specific
processing task.

JCR

Reply | Threaded
Open this post in threaded view
|

Re: IPsec performance

Otto Moerbeek
On Tue, 8 Nov 2005, J.C. Roberts wrote:

> On Tue, 08 Nov 2005 08:51:06 +0100, Vincent Bernat <[hidden email]>
> wrote:
>
> >Hi !
> >
> >I  have several  questions about  IPsec performance  in OpenBSD.  I am
> >using IPsec to maintain more than 60 tunnels and it performs well when
> >those tunnels are idle. Tunnels are  either using 3DES or AES. 3DES is
> >due  to the  fact that  clients  are using  Windows where  AES is  not
> >available.
> >
> >OpenBSD is running on a Celeron 2.4 GHz and openssl speed aes gives 70
> >MB/s and des-ede3 gives 15 MB/s. With 40 Mb/s (megabits/s) of traffic,
> >the processor is used at 100%.  Why such a difference with the results
> >of openssl speed.

Well, from your data I would expect the troughhput to be somewhere
between 15 and 70MB/s. Yo do not specify how much traffic is using aes
vs 3des.

> Celeron? If your goal is to melt an nearly worthless processor into a
> completely worthless chunk of slag, Celeron is perfect.
>
> >I have  added an  Hifn 7955  crypto card. However,  after one  hour of
> >managing the  60 tunnels,  it becomes impossible  to do  any symmetric
> >crypto. There is nothing in the dmesg about that. The only solution is
> >to reboot. With the card disabled,  there is no such problem. Any idea
> >of why I have this problem ?

Apart from issues in the hifn driver, having a crypto accelarator only
helps if your processor is relatively slow. With fast main processors,
the overhead of communicating with the crypto coprocessor becomes too
big. Of course it depends on the kind of coprocessor and the kind of
crypto work done.  Symmetric encryption has different computing needs
that asymetric stuff.

> Just a wild guess but it possibly has something to do with the fact HiFn
> refuses to make their documentation publicly available and because of
> this support in OpenBSD is limited. As someone who went to lunch with
> the HiFn CEO and VP of engineering over a year ago to address this
> problem, the most I can say is I was told a lot of things but nothing
> was actually done... The worst part about the experience was the fact
> Theo told be it would be all talk and no action before I ever met with
> HiFn.
>
> Few things are worse than getting an implied "I told you so" from Theo.
> All the same, at least I tried.
>
> >What kind of hardware will perform 3DES and AES encryption well ? A C3
> >processor has AES encryption built-in  but I must keep 3DES encryption
> >as   well   and   those   processors   are  very   slow   on   general
> >operations. Would  an Opteron  2.2 Ghz performs  better than  an Intel
> >EM64T Xeon 3 GHz ?
> >
> >If  I choose  a multiprocessor  system, will  OpenBSD be  able  to use
> >efficienly the two processors for doing IPsec stuff ?
> >
>
> Now think to yourself on this one. You've got 60 tunnels that must be
> serviced by the processor. A single threaded processor with limited
> cache and task switching (i.e. Celeron) is the wrong choice if not the
> worst choice you could make. The fake multi-core Intel stuff called
> "Hyper Threading" is a small step in the right direction. Next up would
> be real multi-core processors, and lastly, your best choice is having
> multiple multi-core processors.

AFAIK, this is wrong. IPSEC is done by the kernel, and kernel code
does not benefit from MP.

>
> Having an custom ASIC (processor) specifically designed to do crypto
> running as a co-processing slave to your system CPU is a great and
> wonderful thing, but only if it actually works. Though it might not
> solve your immediate problem, it would be good for the project if you
> contacted HiFn yourself and asked them why their documentation is not
> publicly available so the open source world can develop drivers.
>
> "Chris Kenber"  ckenber(at)hifn.com  CEO
> "Russell Dietz" RDietz(at)hifn.com   VP Eng
>
> If, and only if, your real limitation is actually the processing power
> needed for crypto, then obviously having more processing power will most
> likely solve the problem. Before you decide the real problem is
> processing power, please do yourself a favor and look for other possible
> bottlenecks, like interrupt, network, memory... A machine with multiple
> general purpose multi-core processor is not cheap (i.e. dual or quad
> multi-core Opterons would be sweet). Tossing a general purpose CPU at a
> specific processing problem will help but it's better and cheaper to use
> custom co-processors, like crypto ASIC's, to address the specific
> processing task.

I would love to see some actual numbers of people running various
hardware. Otherwise this is all just speculation, partly based on
wrong assupmtions.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: IPsec performance

Vincent Bernat
In reply to this post by J.C. Roberts-2
OoO En cette matinie pluvieuse  du mardi 08 novembre 2005, vers 10:24,
"J.C. Roberts" <[hidden email]> disait:

> Now think to yourself on this one. You've got 60 tunnels that must be
> serviced by the processor. A single threaded processor with limited
> cache and task switching (i.e. Celeron) is the wrong choice if not the
> worst choice you could make. The fake multi-core Intel stuff called
> "Hyper Threading" is a small step in the right direction. Next up would
> be real multi-core processors, and lastly, your best choice is having
> multiple multi-core processors.

Will OpenBSD handle them well in 3.8 with bsd.mp kernel ?

> Having an custom ASIC (processor) specifically designed to do crypto
> running as a co-processing slave to your system CPU is a great and
> wonderful thing, but only if it actually works. Though it might not
> solve your immediate problem, it would be good for the project if you
> contacted HiFn yourself and asked them why their documentation is not
> publicly available so the open source world can develop drivers.

> "Chris Kenber"  ckenber(at)hifn.com  CEO
> "Russell Dietz" RDietz(at)hifn.com VP Eng

I will  mail them  today about this.  Are Broadcom chipsets  better on
this way ?

> If, and only if, your real limitation is actually the processing power
> needed for crypto, then obviously having more processing power will most
> likely solve the problem. Before you decide the real problem is
> processing power, please do yourself a favor and look for other possible
> bottlenecks, like interrupt, network, memory...

 - top reports very low interrupt usage (at most 10% on full charge).
 - there is plenty of memory available  on the system : 37 MB are used
   on 256 MB and 171 MB are really free (from vmstat)
 - without  doing IPsec, I  am able  to do a  FTP transfer of  20 MB/s
   (this is a gigabit card) which  is not spectacular but is above the
   current  bottleneck. I  have  not  looked at  how  to enable  jumbo
   frames.

Thanks for your answer.
--
printk(KERN_WARNING "Multi-volume CD somehow got mounted.\n");
        2.2.16 /usr/src/linux/fs/isofs/inode.c

Reply | Threaded
Open this post in threaded view
|

Re: IPsec performance

Vincent Bernat
In reply to this post by Otto Moerbeek
OoO En cette  fin de matinie radieuse du mardi  08 novembre 2005, vers
11:05, Otto Moerbeek <[hidden email]> disait:

>> >OpenBSD is running on a Celeron 2.4 GHz and openssl speed aes gives 70
>> >MB/s and des-ede3 gives 15 MB/s. With 40 Mb/s (megabits/s) of traffic,
>> >the processor is used at 100%.  Why such a difference with the results
>> >of openssl speed.

> Well, from your data I would expect the troughhput to be somewhere
> between 15 and 70MB/s. Yo do not specify how much traffic is using aes
> vs 3des.

I should do some benchmark. Most of the traffic is 3des.

> Apart from issues in the hifn driver, having a crypto accelarator only
> helps if your processor is relatively slow. With fast main processors,
> the overhead of communicating with the crypto coprocessor becomes too
> big. Of course it depends on the kind of coprocessor and the kind of
> crypto work done.  Symmetric encryption has different computing needs
> that asymetric stuff.

The  HiFN 7955  helped a  lot against  a high  interruption  usage. 20
MBits/s of traffic took 15% of interrupt and nearly nothing of CPU.

> I would love to see some actual numbers of people running various
> hardware. Otherwise this is all just speculation, partly based on
> wrong assupmtions.

I will try  to benchmark the current configuration  with both 3DES and
AES transfer.   Since it is in  production, I cannot  alter easily the
configuration (enabling the  crypto card for example), but  I will try
one night.
--
Use free-form input when possible.
            - The Elements of Programming Style (Kernighan & Plauger)

Reply | Threaded
Open this post in threaded view
|

Re: IPsec performance

Henning Brauer
In reply to this post by J.C. Roberts-2
* J.C. Roberts <[hidden email]> [2005-11-08 10:26]:
> Now think to yourself on this one. You've got 60 tunnels that must be
> serviced by the processor. A single threaded processor with limited
> cache and task switching (i.e. Celeron) is the wrong choice if not the
> worst choice you could make. The fake multi-core Intel stuff called
> "Hyper Threading" is a small step in the right direction. Next up would
> be real multi-core processors, and lastly, your best choice is having
> multiple multi-core processors.

no.
there is no benefit from SMP in this case.

--
BS Web Services, http://www.bsws.de/
OpenBSD-based Webhosting, Mail Services, Managed Servers, ...
Unix is very simple, but it takes a genius to understand the simplicity.
(Dennis Ritchie)

Reply | Threaded
Open this post in threaded view
|

Re: IPsec performance

J.C. Roberts-2
On Wed, 9 Nov 2005 14:34:27 +0100, Henning Brauer
<[hidden email]> wrote:

>* J.C. Roberts <[hidden email]> [2005-11-08 10:26]:
>> Now think to yourself on this one. You've got 60 tunnels that must be
>> serviced by the processor. A single threaded processor with limited
>> cache and task switching (i.e. Celeron) is the wrong choice if not the
>> worst choice you could make. The fake multi-core Intel stuff called
>> "Hyper Threading" is a small step in the right direction. Next up would
>> be real multi-core processors, and lastly, your best choice is having
>> multiple multi-core processors.
>
>no.
>there is no benefit from SMP in this case.

None at all? -Hmmm... sounds suspicious.

I assume Otto is correct about the IPSec implementation being in kernel
and not benefitting directly from SMP, yet depending on what *else* is
running on the box, smp could still provide some indirect benefit by off
loading the other stuff to a second processor/core.

Of course, indirect benefits don't scale as more processors/cores are
added, so I was dead wrong about having lots of them. Bummer.

JCR

Reply | Threaded
Open this post in threaded view
|

Re: IPsec performance

Henning Brauer
* J.C. Roberts <[hidden email]> [2005-11-09 16:50]:

> On Wed, 9 Nov 2005 14:34:27 +0100, Henning Brauer
> <[hidden email]> wrote:
> >* J.C. Roberts <[hidden email]> [2005-11-08 10:26]:
> >> Now think to yourself on this one. You've got 60 tunnels that must be
> >> serviced by the processor. A single threaded processor with limited
> >> cache and task switching (i.e. Celeron) is the wrong choice if not the
> >> worst choice you could make. The fake multi-core Intel stuff called
> >> "Hyper Threading" is a small step in the right direction. Next up would
> >> be real multi-core processors, and lastly, your best choice is having
> >> multiple multi-core processors.
> >no.
> >there is no benefit from SMP in this case.
> None at all? -Hmmm... sounds suspicious.
>
> I assume Otto is correct about the IPSec implementation being in kernel
> and not benefitting directly from SMP, yet depending on what *else* is
> running on the box, smp could still provide some indirect benefit by off
> loading the other stuff to a second processor/core.

in theory, yes.
and then there's extra syncronization and locking cost in the SMP case.

--
BS Web Services, http://www.bsws.de/
OpenBSD-based Webhosting, Mail Services, Managed Servers, ...
Unix is very simple, but it takes a genius to understand the simplicity.
(Dennis Ritchie)

Reply | Threaded
Open this post in threaded view
|

Re: IPsec performance

Bruno Delbono
In reply to this post by Henning Brauer
Henning Brauer wrote:

> no.
> there is no benefit from SMP in this case.

I believe you. In fact, many Cisco VPN routers came with a 400-700 Mhz
Intel PII or PIII cpu's which could handle a few thousand IPSEC
connections.

I've had a good throughput with a 1.0 Ghz PIII with a large number of
clients.

-Bruno