Recognizing Randomness Exhaustion

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Recognizing Randomness Exhaustion

Libertas
Some of the people at [hidden email] and I are trying to
figure out why Tor relays under-perform when running on OpenBSD. Many
such relays aren't even close to being network-bound,
file-descriptor-bound, memory-bound, or CPU-bound, but relay at least
33-50% less traffic than would be expected of a Linux machine in the
same situation.

For those not familiar, a Tor relay will eventually have an open TCP
connection for each of the other >6,000 active relays, and (if it allows
exit traffic) must make outside TCP connections for the user's requests,
so it's pretty file-hungry and crypto-intensive.

One possible explanation is that its randomness store gets exhausted. I
once saw errors like this in my Tor logs, but I don't know how to test
if it's a chronic problem. I also couldn't find anything online. Is
there any easy way to test if this is the bottleneck?

Reply | Threaded
Open this post in threaded view
|

Re: [Tor-BSD] Recognizing Randomness Exhaustion

Greg Troxel-2
Libertas <[hidden email]> writes:

> Some of the people at [hidden email] and I are trying to
> figure out why Tor relays under-perform when running on OpenBSD. Many
> such relays aren't even close to being network-bound,
> file-descriptor-bound, memory-bound, or CPU-bound, but relay at least
> 33-50% less traffic than would be expected of a Linux machine in the
> same situation.

I'm more familiar with NetBSD, but hopefully my comments are helpful.

> For those not familiar, a Tor relay will eventually have an open TCP
> connection for each of the other >6,000 active relays, and (if it allows
> exit traffic) must make outside TCP connections for the user's requests,
> so it's pretty file-hungry and crypto-intensive.

It may also have something to do with TCP.  A few thoughts:

* run netstat -f inet and look and the send queues.  That's not really
  cleanly diagnostic, but if they are all huge, it's a clue

* run netstat -m and vmstat -m (not sure those map from NetBSD).  Look
  for runnig out of mbufs and mbuf clusters.   Perhaps bump up
  NMBCLUSTERS in the kernel if it's not dynamic.

* Take a critical look at your TCP performance.  This is not that easy,
  but it's very informatve.   Get and install xplot:
    http://www.xplot.org/
  Take traces of v4 tcp trafffic with
    tcpdump -wTCP -i wm0 ip and tcp
  and then
    tcpdump -r TCP -tt -n -S | tcpdump2xplot
  Then you'll need to read all the xplot READMEs (see the source).  This
  will show you tcp transmitted segments, sack blocks, the ack line, dup
  acks, and other TCP behavior.  It's not that easy to follow, but if
  you understand TCP you'll be able to spot odd behavior far faster than
  reading text traces.   It's possible that tcpdump2xplot may mishandle
  OpenBSD's tcpdump output - it's perl to turn text back into bits, and
  it's broken over the years with tcpdump upgrades.

  You may well not want to send me a trace, but if you send me the
  binary pcap, the text version above, or the tcpdump2xplot files, I can
  take a look.

> One possible explanation is that its randomness store gets exhausted. I
> once saw errors like this in my Tor logs, but I don't know how to test
> if it's a chronic problem. I also couldn't find anything online. Is
> there any easy way to test if this is the bottleneck?

On NetBSD, there is "rndctl -s".  I would expect the same or similar in
OpenBSD, and you can look every second to see if there are bits still in
the pool.  I don't think this will turn out to be the issue, though, if
you're seeing 30% of what you think you should - I would expect the
performance hit due to running out of bits to be much bigger.

Greg

[demime 1.01d removed an attachment of type application/pgp-signature]

Reply | Threaded
Open this post in threaded view
|

Re: [Tor-BSD] Recognizing Randomness Exhaustion

Libertas
In reply to this post by Libertas
I also completely forgot to mention the below warning, which Tor
0.2.5.10 (the current release) gives when run on OpenBSD 5.6-stable amd64:

> We were built to run on a 64-bit CPU, with OpenSSL 1.0.1 or later,
> but with a version of OpenSSL that apparently lacks accelerated
> support for the NIST P-224 and P-256 groups. Building openssl with
> such support (using the enable-ec_nistp_64_gcc_128 option when
> configuring it) would make ECDH much faster.

Were the mentioned SSL features removed from LibreSSL, or have they not
yet been introduced? Could this be the culprit?

Reply | Threaded
Open this post in threaded view
|

Re: [Tor-BSD] Recognizing Randomness Exhaustion

Carlin Bingham
On Thu, 1 Jan 2015, at 11:49 AM, Libertas wrote:

> I also completely forgot to mention the below warning, which Tor
> 0.2.5.10 (the current release) gives when run on OpenBSD 5.6-stable
> amd64:
>
> > We were built to run on a 64-bit CPU, with OpenSSL 1.0.1 or later,
> > but with a version of OpenSSL that apparently lacks accelerated
> > support for the NIST P-224 and P-256 groups. Building openssl with
> > such support (using the enable-ec_nistp_64_gcc_128 option when
> > configuring it) would make ECDH much faster.
>
> Were the mentioned SSL features removed from LibreSSL, or have they not
> yet been introduced? Could this be the culprit?
>

It appears the code is still there, just isn't enabled by default. Some
searching suggests that OpenSSL doesn't enable it by default either as
the config script can't automatically work out if the platform supports
it.

As a test I edited /usr/include/openssl/opensslfeatures.h to remove the
OPENSSL_NO_EC_NISTP_64_GCC_128 define, and rebuilt libcrypto.


running `openssl speed ecdhp224 ecdhp256`

without acceleration:

                              op      op/s
 224 bit ecdh (nistp224)   0.0003s   3113.0
 256 bit ecdh (nistp256)   0.0004s   2779.1


with acceleration:

                              op      op/s
 224 bit ecdh (nistp224)   0.0001s  10556.8
 256 bit ecdh (nistp256)   0.0002s   4232.4


--
Carlin

Reply | Threaded
Open this post in threaded view
|

Re: [Tor-BSD] Recognizing Randomness Exhaustion

Libertas
Thanks for this!

I should have also specified that I didn't just go ahead and enable them
because I wasn't sure if they're considered safe. I like abiding by
OpenBSD's crypto best practices when possible.

Is there any reason why they're disabled by default?

On another note, I was skeptical about this being the cause because even
OpenBSD Tor relays using only <=12% of their CPU capacity have the
characteristic underperformance. Unless there's a latency issue caused
by this, I feel like it's probably something else.

On another note, I'm looking into system call statistics and other ways
to find the problem here. I'm very new to this, so suggestions on tools
and techniques are appreciated.

On 12/31/2014 06:47 PM, Carlin Bingham wrote:

> On Thu, 1 Jan 2015, at 11:49 AM, Libertas wrote:
>> I also completely forgot to mention the below warning, which Tor
>> 0.2.5.10 (the current release) gives when run on OpenBSD 5.6-stable
>> amd64:
>>
>>> We were built to run on a 64-bit CPU, with OpenSSL 1.0.1 or later,
>>> but with a version of OpenSSL that apparently lacks accelerated
>>> support for the NIST P-224 and P-256 groups. Building openssl with
>>> such support (using the enable-ec_nistp_64_gcc_128 option when
>>> configuring it) would make ECDH much faster.
>>
>> Were the mentioned SSL features removed from LibreSSL, or have they not
>> yet been introduced? Could this be the culprit?
>>
>
> It appears the code is still there, just isn't enabled by default. Some
> searching suggests that OpenSSL doesn't enable it by default either as
> the config script can't automatically work out if the platform supports
> it.
>
> As a test I edited /usr/include/openssl/opensslfeatures.h to remove the
> OPENSSL_NO_EC_NISTP_64_GCC_128 define, and rebuilt libcrypto.
>
>
> running `openssl speed ecdhp224 ecdhp256`
>
> without acceleration:
>
>                               op      op/s
>  224 bit ecdh (nistp224)   0.0003s   3113.0
>  256 bit ecdh (nistp256)   0.0004s   2779.1
>
>
> with acceleration:
>
>                               op      op/s
>  224 bit ecdh (nistp224)   0.0001s  10556.8
>  256 bit ecdh (nistp256)   0.0002s   4232.4
>
>
> --
> Carlin

Reply | Threaded
Open this post in threaded view
|

Re: Tor BSD underperformance (was [Tor-BSD] Recognizing Randomness Exhaustion)

teor
In reply to this post by Greg Troxel-2
On 1 Jan 2015, at 07:39 , Greg Troxel <[hidden email]> wrote:

> Libertas <[hidden email]> writes:
>
>> Some of the people at [hidden email] and I are trying to
>> figure out why Tor relays under-perform when running on OpenBSD. Many
>> such relays aren't even close to being network-bound,
>> file-descriptor-bound, memory-bound, or CPU-bound, but relay at least
>> 33-50% less traffic than would be expected of a Linux machine in the
>> same situation.
>
> I'm more familiar with NetBSD, but hopefully my comments are helpful.
>
>> For those not familiar, a Tor relay will eventually have an open TCP
>> connection for each of the other >6,000 active relays, and (if it allows
>> exit traffic) must make outside TCP connections for the user's requests,
>> so it's pretty file-hungry and crypto-intensive.
>
> It may also have something to do with TCP.  A few thoughts:
>
> * run netstat -f inet and look and the send queues.  That's not really
>  cleanly diagnostic, but if they are all huge, it's a clue
>
> * run netstat -m and vmstat -m (not sure those map from NetBSD).  Look
>  for runnig out of mbufs and mbuf clusters.   Perhaps bump up
>  NMBCLUSTERS in the kernel if it's not dynamic.

Tor 0.2.6.2-alpha (just in the process of being released) has some changes to
queuing behaviour using the KIST algorithm.

The KIST algorithm keeps the queues inside tor, and makes prioritisation
decisions from there, rather than writing as much as possible to the OS TCP
queues. I'm not sure how functional it is on *BSDs, but Nick Mathewson should
be able to comment on that. (I've cc'd tor-dev and Nick.)


teor
pgp 0xABFED1AC
hkp://pgp.mit.edu/
https://gist.github.com/teor2345/d033b8ce0a99adbc89c5
http://0bin.net/paste/Mu92kPyphK0bqmbA#Zvt3gzMrSCAwDN6GKsUk7Q8G-eG+Y+BLpe7wtm
U66Mx

[demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]

Reply | Threaded
Open this post in threaded view
|

Re: [Tor-BSD] Recognizing Randomness Exhaustion

Ted Unangst-6
In reply to this post by Libertas
On Wed, Dec 31, 2014 at 19:42, Libertas wrote:
> Thanks for this!
>
> I should have also specified that I didn't just go ahead and enable them
> because I wasn't sure if they're considered safe. I like abiding by
> OpenBSD's crypto best practices when possible.
>
> Is there any reason why they're disabled by default?

Compiler bugs generate incorrect code for 128 bit integers.

Reply | Threaded
Open this post in threaded view
|

Re: [Tor-BSD] Recognizing Randomness Exhaustion

Miod Vallat
> > I should have also specified that I didn't just go ahead and enable them
> > because I wasn't sure if they're considered safe. I like abiding by
> > OpenBSD's crypto best practices when possible.
> >
> > Is there any reason why they're disabled by default?
>
> Compiler bugs generate incorrect code for 128 bit integers.

In slightly more words, we have tried enabling this code, and found out
the hard way that, when compiled by the system compiler under OpenBSD,
it would generate slightly wrong code, and cause computations to be
subtly wrong.

Until someone spends enough time checking the various compiler versions
around to check which are safe to use, and which are not, this code will
remain disabled in LibreSSL.

Miod

Reply | Threaded
Open this post in threaded view
|

Re: [Tor-BSD] Recognizing Randomness Exhaustion

Richard Johnson-8
In reply to this post by Libertas
On 2014-12-31 11:21, Libertas wrote:
> For those not familiar, a Tor relay will eventually have an open TCP
> connection for each of the other >6,000 active relays, and (if it allows
> exit traffic) must make outside TCP connections for the user's requests,
> so it's pretty file-hungry and crypto-intensive.

It can also be pf-state-hungry. Further, each upstream peer Tor node, and each
client on a Tor entry node, will probably be a pf src.

Packets being dropped and circuits failing when the pf default limits topped
out would naturally present to the tor bandwidth authorities as network
congestion.

In my case, I'm now fairly certain my relays usage grew to the point where
they were allocation-bound in pf. The host was still using the pf defaults
until recently.

Since increasing the pf limits, I'm seeing better throughput. The "current
entries" from pfctl -si currently reach 35k instead of hitting the default
limit of 10k. Also, state inserts and removals are up to 50/s from 29/s, and
matches are topping 56/s instead of 30/s. As well, the pfctl -si "memory could
not be allocated" counter remains a reassuring 0 instead of increasing at
0.9/s. Additionally, netstat -m counters for pf* have a reassuring 0 in the
failure column of the memory resource pool stats. Finally, Tor network traffic
seems to have started climbing.

I increased the limits thusly, since the host does nothing but Tor and unbound
for Tor DNS.

| # don't choke on lots of circuits (default is states 10000,
| # src-nodes 10000, frags 1536)
| set limit { states 100000, src-nodes 100000, frags 8000, \

> One possible explanation is that its randomness store gets exhausted. I
> once saw errors like this in my Tor logs, but I don't know how to test
> if it's a chronic problem. I also couldn't find anything online. Is
> there any easy way to test if this is the bottleneck?

I suspect Tor won't exhaust randomness; random(4) shouldn't block. (From a
cursory look at the source, Tor references /dev/urandom, and doesn't use
arc4random.)


Richard

Reply | Threaded
Open this post in threaded view
|

Re: [Tor-BSD] Recognizing Randomness Exhaustion

Libertas
I've tuned PF parameters in the past, but it doesn't seem to be the
issue. My current pfctl and netstat -m outputs suggest that there are
more than enough available resources and no reported failures.

I remember someone on [hidden email] suggesting that it could
be at least partially due to PF being slower than other OS's firewalls.

However, we're now finding that a profusion of gettimeofday() syscalls
may be the issue. It was independently discovered by the operator of
IPredator, the highest-bandwidth Tor relay:

        https://ipredator.se/guide/torserver#performance

My 800 KB/s exit node had up to 7,000 gettimeofday() calls a second,
along with hundreds of clock_gettime() calls.

Because IPredator runs Linux, he used vsyscalls to speed things up.
We'll probably need to find something more creative, like using our time
caching more.

We're working on it with this ticket:

        https://trac.torproject.org/projects/tor/ticket/14056

On 01/01/2015 10:45 PM, Richard Johnson wrote:

> It can also be pf-state-hungry. Further, each upstream peer Tor node, and each
> client on a Tor entry node, will probably be a pf src.
>
> Packets being dropped and circuits failing when the pf default limits topped
> out would naturally present to the tor bandwidth authorities as network
> congestion.
>
> In my case, I'm now fairly certain my relays usage grew to the point where
> they were allocation-bound in pf. The host was still using the pf defaults
> until recently.
>
> Since increasing the pf limits, I'm seeing better throughput. The "current
> entries" from pfctl -si currently reach 35k instead of hitting the default
> limit of 10k. Also, state inserts and removals are up to 50/s from 29/s, and
> matches are topping 56/s instead of 30/s. As well, the pfctl -si "memory could
> not be allocated" counter remains a reassuring 0 instead of increasing at
> 0.9/s. Additionally, netstat -m counters for pf* have a reassuring 0 in the
> failure column of the memory resource pool stats. Finally, Tor network traffic
> seems to have started climbing.
>
> I increased the limits thusly, since the host does nothing but Tor and unbound
> for Tor DNS.
>
> | # don't choke on lots of circuits (default is states 10000,
> | # src-nodes 10000, frags 1536)
> | set limit { states 100000, src-nodes 100000, frags 8000, \

Reply | Threaded
Open this post in threaded view
|

Re: Recognizing Randomness Exhaustion

Stuart Henderson
In reply to this post by Libertas
On 2014-12-31, Libertas <[hidden email]> wrote:
> One possible explanation is that its randomness store gets exhausted.

OpenBSD's RNG subsystem doesn't get exhausted like this.

Reply | Threaded
Open this post in threaded view
|

Re: [Tor-BSD] Recognizing Randomness Exhaustion

Stuart Henderson
In reply to this post by Miod Vallat
On 2015-01-01, Miod Vallat <[hidden email]> wrote:

>> > I should have also specified that I didn't just go ahead and enable them
>> > because I wasn't sure if they're considered safe. I like abiding by
>> > OpenBSD's crypto best practices when possible.
>> >
>> > Is there any reason why they're disabled by default?
>>
>> Compiler bugs generate incorrect code for 128 bit integers.
>
> In slightly more words, we have tried enabling this code, and found out
> the hard way that, when compiled by the system compiler under OpenBSD,
> it would generate slightly wrong code, and cause computations to be
> subtly wrong.
>
> Until someone spends enough time checking the various compiler versions
> around to check which are safe to use, and which are not, this code will
> remain disabled in LibreSSL.

The specific failure we saw was in openssh; "key_parse_private_pem: bad
ECDSA key" when reading a saved id_ecdsa.

Reply | Threaded
Open this post in threaded view
|

Re: Tor BSD underperformance (was [Tor-BSD] Recognizing Randomness Exhaustion)

Greg Troxel-2
In reply to this post by teor
teor <[hidden email]> writes:

> Tor 0.2.6.2-alpha (just in the process of being released) has some
> changes to queuing behaviour using the KIST algorithm.
>
> The KIST algorithm keeps the queues inside tor, and makes
> prioritisation decisions from there, rather than writing as much as
> possible to the OS TCP queues. I'm not sure how functional it is on
> *BSDs, but Nick Mathewson should be able to comment on that. (I've
> cc'd tor-dev and Nick.)

From skimming the KIST paper (I will read in detail when I find time),
it seems they are claiming increase in throughput of around 10%, with
the main benefit being lower latency.  So while that sounds great, it
doesn't seem like lack of KIST is the reason for the apparent 3x
slowdown observed in OpenBSD.

Does anyone have experience to report on any platform other than Linux
or OSX?

[demime 1.01d removed an attachment of type application/pgp-signature]

Reply | Threaded
Open this post in threaded view
|

Re: [Tor-BSD] Recognizing Randomness Exhaustion

Henning Brauer-4
In reply to this post by Libertas
* Libertas <[hidden email]> [2015-01-02 06:25]:
> I've tuned PF parameters in the past, but it doesn't seem to be the
> issue. My current pfctl and netstat -m outputs suggest that there are
> more than enough available resources and no reported failures.

just a sidenote, it is safe to bump the default state limit, very far
even on anything semi-modern. the default limit of 10k states is good
for workstations and the like or tiny embedded-style deployments. I've
gone up to 2M, things get a bit slow if your state table really is
that big but everything keeps working.

> I remember someone on [hidden email] suggesting that it could
> be at least partially due to PF being slower than other OS's firewalls.

I feel offended :)
Pretty certainly not.

> However, we're now finding that a profusion of gettimeofday() syscalls
> may be the issue. It was independently discovered by the operator of
> IPredator, the highest-bandwidth Tor relay:
>
> https://ipredator.se/guide/torserver#performance
>
> My 800 KB/s exit node had up to 7,000 gettimeofday() calls a second,
> along with hundreds of clock_gettime() calls.

those aren't all that cheap...

--
Henning Brauer, [hidden email], [hidden email]
BS Web Services GmbH, http://bsws.de, Full-Service ISP
Secure Hosting, Mail and DNS. Virtual & Dedicated Servers, Root to Fully Managed
Henning Brauer Consulting, http://henningbrauer.com/