relayd memory usage when loading large URL lists

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

relayd memory usage when loading large URL lists

Felipe Scarel
Hello all,

I'm implementing a simple SSL forward proxy using relayd.
Configuration has been fine, as was testing. There seems to be one
issue with memory consumption, however.

To better illustrate my issue, here follows an excerpt of /etc/relayd.conf :

http protocol httpsfilter {
  tcp { nodelay, sack, socket buffer 65536, backlog 1024 }
  return error

  match header set "Keep-Alive" value "$TIMEOUT"
  match header set "Connecton" value "close"

  pass quick url file "/etc/relayd.d/custom_whitelist"
  block url file "/etc/relayd.d/custom_blacklist"
  include "/etc/relayd.d/auto_blacklist"

  ssl ca key  "/etc/ssl/private/ca.key" password "password"
  ssl ca cert "/etc/ssl/ca.crt"
}

So basically it checks against a custom whitelist, then a custom
blacklist, and finally an "auto" blacklist (which is the main source
of the problem). Using a few URLs with both custom black/white lists
poses no issue, but when attempting to load a somewhat bigger URL list
downloaded from the internet (I'm using
ftp://ftp.ut-capitole.fr/pub/reseau/cache/squidguard_contrib/blacklists.tar.gz)
I run into memory problems.

For example, here is relayd's memory usage when only the custom
white/black lists are loaded (2 URLs total, no big deal):

# ps aux | grep relayd
USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
_relayd  17238  0.0  0.1  1528  3208 ??  I      3:27PM    0:00.01
relayd: relay (relayd)
_relayd  14280  0.0  0.1  1524  3176 ??  I      3:27PM    0:00.02
relayd: relay (relayd)
_relayd  30448  0.0  0.1  1396  2812 ??  I      3:27PM    0:00.01
relayd: ca (relayd)
_relayd  10020  0.0  0.1  1376  2768 ??  I      3:27PM    0:00.01
relayd: ca (relayd)
_relayd  25775  0.0  0.1  1400  2852 ??  I      3:27PM    0:00.01
relayd: ca (relayd)
root       346  0.0  0.1  1912  3672 ??  Is     3:27PM    0:00.02
relayd: parent (relayd)
_relayd  15883  0.0  0.1  1440  2828 ??  I      3:27PM    0:00.01
relayd: pfe (relayd)
_relayd  32000  0.0  0.1  1220  2560 ??  I      3:27PM    0:00.01
relayd: hce (relayd)
_relayd   2677  0.0  0.1  1516  3188 ??  I      3:27PM    0:00.01
relayd: relay (relayd)

Now loading the "phishing/domains" URL list, which has about ~63k
entries. relayd's "parent" process ballons to over 2GB memory usage
(I'm assuming it's reading the URL lists and building a data structure
for the relays), and after that the relays stabilize with the
following memory usage:

# ps aux | grep relayd
USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
_relayd  12982  0.0 12.9 516728 526288 ??  S      3:31PM    0:03.44
relayd: relay (relayd)
_relayd   1206  0.0  0.1  1368  2836 ??  I      3:31PM    0:00.01
relayd: ca (relayd)
root     25673  0.0  2.7 155616 111228 ??  Is     3:31PM    0:16.35
relayd: parent (relayd)
_relayd  15513  0.0  0.1  1416  2832 ??  S      3:31PM    0:00.01
relayd: pfe (relayd)
_relayd  15643  0.0  0.1  1200  2560 ??  I      3:31PM    0:00.01
relayd: hce (relayd)
_relayd  25822  0.0 12.9 516716 526296 ??  S      3:31PM    0:03.37
relayd: relay (relayd)
_relayd  17950  0.0  0.1  1380  2824 ??  I      3:31PM    0:00.01
relayd: ca (relayd)
_relayd   9068  0.0  0.1  1360  2784 ??  I      3:31PM    0:00.01
relayd: ca (relayd)
_relayd  19666  0.0 12.9 516712 526292 ??  S      3:31PM    0:03.46
relayd: relay (relayd)

So that's about ~520 MB of memory per relay process, out of 3 total.
Next I load another URL list alongside the previous one, the
"adult/urls" list, which contains roughtly ~55k entries. Adding up
with the previous list, we have more or less ~118k URLs for relayd to
process. The "parent" process takes a couple minutes to process
everything, going over 4GB VSZ and 2.2GB RSS. After all's said and
done, here's what's shown by ps:

# ps aux | grep relayd
USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
_relayd   6332  0.0  0.1  1428  2228 ??  I      3:35PM    0:00.01
relayd: ca (relayd)
_relayd   8736  0.0 23.9 967808 976768 ??  I      3:35PM    0:06.81
relayd: relay (relayd)
_relayd  22890  0.0 23.9 967812 976768 ??  I      3:35PM    0:06.77
relayd: relay (relayd)
_relayd   5871  0.0 23.9 967804 976760 ??  I      3:35PM    0:06.33
relayd: relay (relayd)
_relayd   8199  0.0  0.1  1440  2256 ??  I      3:35PM    0:00.01
relayd: ca (relayd)
root      5571  0.0  5.3 315032 214796 ??  Is     3:35PM    1:28.45
relayd: parent (relayd)
_relayd  30781  0.0  0.1  1488  2136 ??  S      3:35PM    0:00.01
relayd: pfe (relayd)
_relayd   1502  0.0  0.0  1272  2040 ??  I      3:35PM    0:00.01
relayd: hce (relayd)
_relayd  29135  0.0  0.1  1432  2236 ??  I      3:35PM    0:00.01
relayd: ca (relayd)

Nearly 1GB of RAM per relay process, and ~214 MB to the parent
process. This server I'm working with has 4GB of RAM, so it can't go
much further. If I attempt to load the biggest URL list from the set,
"adult/domains" (slightly above 1 million entries), the server hangs
up after a while and demands a hard reset.

Is there any configuration parameter I'm missing here? I've reviewed
the manpage a few times, and aside from lowering the number of relays
with "prefork", I can't think of much else. I can, of course, provide
additional information if necessary.

Thanks for your input,
fbscarel

Reply | Threaded
Open this post in threaded view
|

Re: relayd memory usage when loading large URL lists

Felipe Scarel
On Sun, Mar 1, 2015 at 4:45 PM, Felipe Scarel <[hidden email]> wrote:

> Hello all,
>
> I'm implementing a simple SSL forward proxy using relayd.
> Configuration has been fine, as was testing. There seems to be one
> issue with memory consumption, however.
>
> To better illustrate my issue, here follows an excerpt of /etc/relayd.conf :
>
> http protocol httpsfilter {
>   tcp { nodelay, sack, socket buffer 65536, backlog 1024 }
>   return error
>
>   match header set "Keep-Alive" value "$TIMEOUT"
>   match header set "Connecton" value "close"
>
>   pass quick url file "/etc/relayd.d/custom_whitelist"
>   block url file "/etc/relayd.d/custom_blacklist"
>   include "/etc/relayd.d/auto_blacklist"
>
>   ssl ca key  "/etc/ssl/private/ca.key" password "password"
>   ssl ca cert "/etc/ssl/ca.crt"
> }
>
> So basically it checks against a custom whitelist, then a custom
> blacklist, and finally an "auto" blacklist (which is the main source
> of the problem). Using a few URLs with both custom black/white lists
> poses no issue, but when attempting to load a somewhat bigger URL list
> downloaded from the internet (I'm using
> ftp://ftp.ut-capitole.fr/pub/reseau/cache/squidguard_contrib/blacklists.tar.gz)
> I run into memory problems.
>
> For example, here is relayd's memory usage when only the custom
> white/black lists are loaded (2 URLs total, no big deal):
>
> # ps aux | grep relayd
> USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
> _relayd  17238  0.0  0.1  1528  3208 ??  I      3:27PM    0:00.01
> relayd: relay (relayd)
> _relayd  14280  0.0  0.1  1524  3176 ??  I      3:27PM    0:00.02
> relayd: relay (relayd)
> _relayd  30448  0.0  0.1  1396  2812 ??  I      3:27PM    0:00.01
> relayd: ca (relayd)
> _relayd  10020  0.0  0.1  1376  2768 ??  I      3:27PM    0:00.01
> relayd: ca (relayd)
> _relayd  25775  0.0  0.1  1400  2852 ??  I      3:27PM    0:00.01
> relayd: ca (relayd)
> root       346  0.0  0.1  1912  3672 ??  Is     3:27PM    0:00.02
> relayd: parent (relayd)
> _relayd  15883  0.0  0.1  1440  2828 ??  I      3:27PM    0:00.01
> relayd: pfe (relayd)
> _relayd  32000  0.0  0.1  1220  2560 ??  I      3:27PM    0:00.01
> relayd: hce (relayd)
> _relayd   2677  0.0  0.1  1516  3188 ??  I      3:27PM    0:00.01
> relayd: relay (relayd)
>
> Now loading the "phishing/domains" URL list, which has about ~63k
> entries. relayd's "parent" process ballons to over 2GB memory usage
> (I'm assuming it's reading the URL lists and building a data structure
> for the relays), and after that the relays stabilize with the
> following memory usage:
>
> # ps aux | grep relayd
> USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
> _relayd  12982  0.0 12.9 516728 526288 ??  S      3:31PM    0:03.44
> relayd: relay (relayd)
> _relayd   1206  0.0  0.1  1368  2836 ??  I      3:31PM    0:00.01
> relayd: ca (relayd)
> root     25673  0.0  2.7 155616 111228 ??  Is     3:31PM    0:16.35
> relayd: parent (relayd)
> _relayd  15513  0.0  0.1  1416  2832 ??  S      3:31PM    0:00.01
> relayd: pfe (relayd)
> _relayd  15643  0.0  0.1  1200  2560 ??  I      3:31PM    0:00.01
> relayd: hce (relayd)
> _relayd  25822  0.0 12.9 516716 526296 ??  S      3:31PM    0:03.37
> relayd: relay (relayd)
> _relayd  17950  0.0  0.1  1380  2824 ??  I      3:31PM    0:00.01
> relayd: ca (relayd)
> _relayd   9068  0.0  0.1  1360  2784 ??  I      3:31PM    0:00.01
> relayd: ca (relayd)
> _relayd  19666  0.0 12.9 516712 526292 ??  S      3:31PM    0:03.46
> relayd: relay (relayd)
>
> So that's about ~520 MB of memory per relay process, out of 3 total.
> Next I load another URL list alongside the previous one, the
> "adult/urls" list, which contains roughtly ~55k entries. Adding up
> with the previous list, we have more or less ~118k URLs for relayd to
> process. The "parent" process takes a couple minutes to process
> everything, going over 4GB VSZ and 2.2GB RSS. After all's said and
> done, here's what's shown by ps:
>
> # ps aux | grep relayd
> USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME COMMAND
> _relayd   6332  0.0  0.1  1428  2228 ??  I      3:35PM    0:00.01
> relayd: ca (relayd)
> _relayd   8736  0.0 23.9 967808 976768 ??  I      3:35PM    0:06.81
> relayd: relay (relayd)
> _relayd  22890  0.0 23.9 967812 976768 ??  I      3:35PM    0:06.77
> relayd: relay (relayd)
> _relayd   5871  0.0 23.9 967804 976760 ??  I      3:35PM    0:06.33
> relayd: relay (relayd)
> _relayd   8199  0.0  0.1  1440  2256 ??  I      3:35PM    0:00.01
> relayd: ca (relayd)
> root      5571  0.0  5.3 315032 214796 ??  Is     3:35PM    1:28.45
> relayd: parent (relayd)
> _relayd  30781  0.0  0.1  1488  2136 ??  S      3:35PM    0:00.01
> relayd: pfe (relayd)
> _relayd   1502  0.0  0.0  1272  2040 ??  I      3:35PM    0:00.01
> relayd: hce (relayd)
> _relayd  29135  0.0  0.1  1432  2236 ??  I      3:35PM    0:00.01
> relayd: ca (relayd)
>
> Nearly 1GB of RAM per relay process, and ~214 MB to the parent
> process. This server I'm working with has 4GB of RAM, so it can't go
> much further. If I attempt to load the biggest URL list from the set,
> "adult/domains" (slightly above 1 million entries), the server hangs
> up after a while and demands a hard reset.
>
> Is there any configuration parameter I'm missing here? I've reviewed
> the manpage a few times, and aside from lowering the number of relays
> with "prefork", I can't think of much else. I can, of course, provide
> additional information if necessary.
>
> Thanks for your input,
> fbscarel


I forgot to add that I'm running OpenBSD 5.6-release over here. If
needed, I can test with 5.6-stable or -current.

Regards,
fbscarel

Reply | Threaded
Open this post in threaded view
|

Re: relayd memory usage when loading large URL lists

Stuart Henderson
In reply to this post by Felipe Scarel
On 2015-03-01, Felipe Scarel <[hidden email]> wrote:
> Now loading the "phishing/domains" URL list, which has about ~63k
> entries. relayd's "parent" process ballons to over 2GB memory usage
> (I'm assuming it's reading the URL lists and building a data structure
> for the relays),

Yes, it's building a red-black tree structure during startup.

> So that's about ~520 MB of memory per relay process, out of 3 total.

This is probably shared (fork does copy-on-write, so forked processes can
just use the original memory unless they make changes to it). Try adjusting
the "prefork" number and check the free memory with top(1) rather than the
per-process memory with ps(1).

Reply | Threaded
Open this post in threaded view
|

Re: relayd memory usage when loading large URL lists

Felipe Scarel
On Wed, Mar 4, 2015 at 6:29 AM, Stuart Henderson <[hidden email]> wrote:
> On 2015-03-01, Felipe Scarel <[hidden email]> wrote:
>> Now loading the "phishing/domains" URL list, which has about ~63k
>> entries. relayd's "parent" process ballons to over 2GB memory usage
>> (I'm assuming it's reading the URL lists and building a data structure
>> for the relays),
>
> Yes, it's building a red-black tree structure during startup.
>

Nice to know.

>> So that's about ~520 MB of memory per relay process, out of 3 total.
>
> This is probably shared (fork does copy-on-write, so forked processes can
> just use the original memory unless they make changes to it). Try adjusting
> the "prefork" number and check the free memory with top(1) rather than the
> per-process memory with ps(1).
>

Alright, I'll do that. In other news, Reyk replied to me via Twitter
saying that relayd "is not optimized for large blacklists yet". I'll
keep using the current version for the time being, as ~100k URLs is
sufficient for my current demand.

Thanks for your help!