Re: kernel/4604: [Fwd: fxp nics + pf + bridge = panic]

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: kernel/4604: [Fwd: fxp nics + pf + bridge = panic]

Don Feliciano
The following reply was made to PR kernel/4604; it has been noted by GNATS.

From: Don Feliciano <[hidden email]>
To: Pedro Martelletto <[hidden email]>
Cc: [hidden email], [hidden email]
Subject: Re: kernel/4604: [Fwd: fxp nics + pf + bridge = panic]
Date: Fri, 18 Nov 2005 10:45:06 -0500

 Some more information...
 
 I've determined everything is stable with pf enabled, but with an empty
 ruleset (nothing in pf.conf).  Implementing the minimalistic config
 below crashed for me within seconds of enabling:
 
 #### Interface aliases
 # Interface aliases for ease of administration.
 
 ext_if = "fxp0"      # Untrusted (to LAN)
 int_if = "fxp1"      # Trusted (to switch)
 
 #### Traffic Normalization
 # Prevent fragmentation attacks
 scrub in on $ext_if all fragment reassemble no-df
 scrub out on $ext_if all fragment reassemble random-id no-df
 
 ### Pass traffic on the loopback interface in either direction
 pass quick on lo0 all
 
 #### Internal Bridge interface rules
 # Filter on external interface - in bridge mode,
 # we only filter on one interface.
 pass in quick on $int_if all
 pass out quick on $int_if all
 
 ### Don't filter anything and see if we still panic
 pass in quick on $ext_if all
 pass out quick on $ext_if all
 
 Crash:
 
 panic: pool_get(mclpl): free list modified: magic=c6830e9e; page
 0xd38cc000; it
 em addr 0xd38cc800
 Stopped at      Debugger+0x4:   leave
 RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
 DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
 ddb> trace
 Debugger(0,0,0,d38cc800,d05d2760) at Debugger+0x4
 panic(d04f6bc0,d04f8b89,c6830e9e,d38cc000,d38cc800) at panic+0x63
 pool_get(d05d2760,0,da369b00,1,d06f1dfc) at pool_get+0x315
 fxp_start(d0958040,d06f1dfc,e,d09e3c00) at fxp_start+0x2ac
 bridge_ifenqueue(d09ec000,d0958040,da370300,d0958040,da370300) at
 bridge_ifenqu
 eue+0xa2
 bridgeintr_frame(d09ec000,da370300,0,d06f0000) at bridgeintr_frame+0x270
 bridgeintr(58,10,10,10,d06f0000) at bridgeintr+0x6a
 Bad frame pointer: 0xd06f1e44
 
 I am now trying the same ruleset sans the srub directives to see if it
 makes any difference.

Reply | Threaded
Open this post in threaded view
|

Re: kernel/4604: [Fwd: fxp nics + pf + bridge = panic]

Don Feliciano
The following reply was made to PR kernel/4604; it has been noted by GNATS.

From: Don Feliciano <[hidden email]>
To: Pedro Martelletto <[hidden email]>
Cc: [hidden email], [hidden email]
Subject: Re: kernel/4604: [Fwd: fxp nics + pf + bridge = panic]
Date: Mon, 21 Nov 2005 06:54:42 -0500

 O.K.  So this bug has been narrowed down to scrub.  A colleague offered
 up this tidbit:
 
 I see that a couple of other people have gotten panics with pool_get
 that also have fxp cards and bridging (all the way back to 3.1,
 actually). I found this nugget, which Daniel posted about his fixes to 3.2:
 
 "These occur only on pf bridges when scrub is enabled. While the bugs
 obviously affect stability, it's uncertain whether they can be exploited."

Reply | Threaded
Open this post in threaded view
|

Re: kernel/4604: [Fwd: fxp nics + pf + bridge = panic]

Daniel Hartmeier
On Mon, Nov 21, 2005 at 05:15:02AM -0700, Don Feliciano wrote:

>  O.K.  So this bug has been narrowed down to scrub.  A colleague offered
>  up this tidbit:

It may be inviting to think that scrub is the cause of the problem
(since enabling it reproduces it), but that's not necessarily correct.

Enabling scrub on a machine that filters fragments causes pf to allocate
and free pool memory. The panic you posted indicates that pool internals
were corrupted. The problem with debugging this is that the stack trace
of the panic does not show where the corruption occurs, but only the fxp
path which detects it (which is likely far later and far away from where
the corruption occurs).

The underlying question is: where does the pool corruption occur (use
after free would be a good guess), and why does it occur only in this
specific combination (fxp, bridge, fragments) but not others.

To answer that, one has to examine what functions of pf, bridge and fxp
get called, in what order, and what functions are interrupting others.
It's most likely some concurrency/locking/timing issue.

These are hard to debug. Don't hold your breath ;)

Daniel