pppoe repeated disconnects

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

pppoe repeated disconnects

Pim van Pelt
Hoi,

I have recently switched from a Zyxel modem terminated pppoe
connection, to an OpenBSD based termination of the pppoe connection. I
think I have a reasonable configuration, and both kernel as well as
userland pppoe configs connect fine, but after a few minutes the
ingress traffic halts and the connection drops. I am running 5.2, with
the following /etc/ppp/ppp.conf ([1] for userland) and
/etc/hostname.pppoe0 ([2] for kernel). Both exhibit the same issue.
When the connection establishes, I have the following device:
$ ifconfig pppoe0
pppoe0: flags=8851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1492
        priority: 0
        dev: vlan5 state: session
        sid: 0x1268 PADI retries: 8 PADR retries: 0 time: 00:04:02
        sppp: phase network authproto chap authname "[hidden email]"
        groups: pppoe
        status: active
        inet6 fe80::260:e0ff:fe53:7978%pppoe0 ->  prefixlen 64 scopeid 0x1e
        inet  81.6.62.85 --> 80.254.161.242 netmask 0xffffffff

The ISP routes me 81.6.62.84/30 and when the connection is up, this
works (I can ping all 4 IP addresses). There's a netscreen at
81.6.62.86 behind this machine and it is responding to ping's from the
internet just fine. A few minutes later, traffic stops flowing and
about one minute later, pppoe0 disconnects and then reconnects a few
minutes later, to rinse and repeat. So I tcpdump'ed the kernel pppoe,
and noticed the following snippet (see [3] for details). I am MAC
00:60:e0:53:79:79 , the ISP is MAC 00:90:1a:a4:8d:20.

At 00:27:30.706636 I see ingress to .86, which is replied at
00:27:30.708050 egress. Then, ingress halts. I send three echos at
00:27:52.564297, 00:28:02.563980 and 00:28:12.563665, and then give up
at 00:28:22.563420 and send a Terminate-Request. I then send three
Initiation requests, which are answered finally at 00:29:27.771647 by
the ISP. The connection re-establishes and the cycle repeats.

When I try the same thing using userland ppp (to benefit from
debugging), things look pretty good to me, but also after a few
minutes the physical link goes silent and I disconnect. The logs[4]
show that the connection establishes (and on the ppp command line,
'show ipcp' and 'show lcp' look fine to me), but the problem persists.

I've read up on ppp/pppd/pppoe for both userspace and kernel, but I'm
stumped and at this point I believe the problem is with the ISP (I say
this because traffic stops flowing while I'm sending LCP echos, and I
always reply the ISPs echos, but after a few failed echos pppoe on my
machine gives up). Can somebody please confirm this before I open a
problem ticket with my ISP, or help me find a way to gather more
information to diagnose this issue?

Kind regards,
Pim/ Zurich



[1] cat /etc/ppp/ppp.conf
default:
  set log Phase Chat LCP IPCP CCP tun command
  set speed sync
  disable acfcomp protocomp shortseq vjcomp
  deny acfcomp

green:
  set device "!/usr/sbin/pppoe -i vlan5"
  set mtu max 1492
  set mru max 1492
  set authname "[hidden email]"
  set authkey "Y"
  set ifaddr 0.0.0.0 0.0.0.0
  disable echo lqr
  deny lqr

[2] cat /etc/hostname.pppoe0
inet 0.0.0.0 255.255.255.255 NONE pppoedev vlan5 authproto chap
authname '[hidden email]' authkey 'Y' up
dest 0.0.0.1

[3] http://pastebin.com/ph6xrNaW

[4] http://pastebin.com/grifHJ41
--
Pim van Pelt <[hidden email]>
PBVP1-RIPE - http://www.ipng.nl/

Reply | Threaded
Open this post in threaded view
|

Re: pppoe repeated disconnects

Stuart Henderson
On 2013-02-21, Pim van Pelt <[hidden email]> wrote:

> Hoi,
>
> I have recently switched from a Zyxel modem terminated pppoe
> connection, to an OpenBSD based termination of the pppoe connection. I
> think I have a reasonable configuration, and both kernel as well as
> userland pppoe configs connect fine, but after a few minutes the
> ingress traffic halts and the connection drops. I am running 5.2, with
> the following /etc/ppp/ppp.conf ([1] for userland) and
> /etc/hostname.pppoe0 ([2] for kernel). Both exhibit the same issue.
> When the connection establishes, I have the following device:
> $ ifconfig pppoe0
> pppoe0: flags=8851<UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST> mtu 1492
>         priority: 0
>         dev: vlan5 state: session
>         sid: 0x1268 PADI retries: 8 PADR retries: 0 time: 00:04:02
>         sppp: phase network authproto chap authname "[hidden email]"
>         groups: pppoe
>         status: active
>         inet6 fe80::260:e0ff:fe53:7978%pppoe0 ->  prefixlen 64 scopeid 0x1e
>         inet  81.6.62.85 --> 80.254.161.242 netmask 0xffffffff
>
> The ISP routes me 81.6.62.84/30 and when the connection is up, this
> works (I can ping all 4 IP addresses). There's a netscreen at
> 81.6.62.86 behind this machine and it is responding to ping's from the
> internet just fine. A few minutes later, traffic stops flowing and
> about one minute later, pppoe0 disconnects and then reconnects a few
> minutes later, to rinse and repeat. So I tcpdump'ed the kernel pppoe,
> and noticed the following snippet (see [3] for details). I am MAC
> 00:60:e0:53:79:79 , the ISP is MAC 00:90:1a:a4:8d:20.
>
> At 00:27:30.706636 I see ingress to .86, which is replied at
> 00:27:30.708050 egress. Then, ingress halts. I send three echos at
> 00:27:52.564297, 00:28:02.563980 and 00:28:12.563665, and then give up
> at 00:28:22.563420 and send a Terminate-Request. I then send three
> Initiation requests, which are answered finally at 00:29:27.771647 by
> the ISP. The connection re-establishes and the cycle repeats.

Your log only shows the last incoming packet and a few unanswered
outgoing packets, so it's not enough to tell, does the ISP reply to
any of your earlier LCP echo requests or does it ignore all of them?

Normally if there is a period of 30 seconds with either no LCP
keepalives or no user data, pppoe(4) will drop the session and log
"LCP keepalive timeout", which it looks like should show up at the
bottom of dmesg output.

So, if they do normally respond to LCP, it looks like the link
is dropping out somehow..

Or, if they don't normally respond to LCP, you may either need some
other way to generate data to keep the session alive, disable the
timeout (by hacking the driver), or ideally see if they can start
responding (easier if it's an ISP compliant with xkcd 806 ;)

> I've read up on ppp/pppd/pppoe for both userspace and kernel, but I'm
> stumped and at this point I believe the problem is with the ISP (I say
> this because traffic stops flowing while I'm sending LCP echos, and I
> always reply the ISPs echos, but after a few failed echos pppoe on my
> machine gives up). Can somebody please confirm this before I open a
> problem ticket with my ISP, or help me find a way to gather more
> information to diagnose this issue?

> [1] cat /etc/ppp/ppp.conf
> default:
>   set log Phase Chat LCP IPCP CCP tun command
>   set speed sync
>   disable acfcomp protocomp shortseq vjcomp
>   deny acfcomp
>
> green:
>   set device "!/usr/sbin/pppoe -i vlan5"
>   set mtu max 1492
>   set mru max 1492
>   set authname "[hidden email]"
>   set authkey "Y"
>   set ifaddr 0.0.0.0 0.0.0.0
>   disable echo lqr
>   deny lqr

I haven't really used iij ppp since around 1998 so not too sure about
syntax here (though it seems ok).. but

> [2] cat /etc/hostname.pppoe0
> inet 0.0.0.0 255.255.255.255 NONE pppoedev vlan5 authproto chap
> authname '[hidden email]' authkey 'Y' up
> dest 0.0.0.1

this definitely looks right to me, and I have a bunch of pppoe(4)
running over vlans, so there's no problem in that respect.

> [3] http://pastebin.com/ph6xrNaW
>
> [4] http://pastebin.com/grifHJ41

Reply | Threaded
Open this post in threaded view
|

Re: pppoe repeated disconnects

Pim van Pelt
Hoi,

Thanks for helping me think about this, Stu.

2013/2/22 Stuart Henderson <[hidden email]>:

>> At 00:27:30.706636 I see ingress to .86, which is replied at
>> 00:27:30.708050 egress. Then, ingress halts. I send three echos at
>> 00:27:52.564297, 00:28:02.563980 and 00:28:12.563665, and then give up
>> at 00:28:22.563420 and send a Terminate-Request. I then send three
>> Initiation requests, which are answered finally at 00:29:27.771647 by
>> the ISP. The connection re-establishes and the cycle repeats.
>
> Your log only shows the last incoming packet and a few unanswered
> outgoing packets, so it's not enough to tell, does the ISP reply to
> any of your earlier LCP echo requests or does it ignore all of them?
You are right, I only pasted the tail end of the data, but I do have
the whole file still:
http://www.ipng.nl/pppoe-tcpdump/pppoe4.tcpdump (kernel)
http://www.ipng.nl/pppoe-tcpdump/pppoe8.tcpdump (userland)

In the call I was referring to above (pppoe4.tcpdump, the kernel
version), a whole session lasts from 00:25:32.588726 to
00:28:22.583343.
I took another look and it seems that my side does not send LCP echos,
but it does reply to them. Typically, the ISP (mac ending in 8d:20)
sends LCP echos every 10s, and my side (mac ending in 79:79) responds
to them, for example:
00:26:33.656302 00:90:1a:a4:8d:20 00:60:e0:53:79:79 8864 60:
PPPoE-Session code Session, version 1, type 1, id 0x124a, length 14
LCP: Echo-Request, Magic-Number=373307203
00:26:33.656321 00:60:e0:53:79:79 00:90:1a:a4:8d:20 8864 34:
PPPoE-Session code Session, version 1, type 1, id 0x124a, length 14
LCP: Echo-Reply, Magic-Number=1979569634

You mentioned that I would/should be sending echos as well, but my
side definitely does not do that every N seconds. It does, however,
when the connection did not see packets for a while, for example at
00:24:42.570312, 00:24:52.569996, 00:25:02.569679 (none responded by
ISP), and it then sends terminate-request at 00:25:12.569442. I
counted the Echo-Requests, I saw 20 from them, and replied 20 times. I
sent 6 to them, they replied 0 times. In the userland pppoe(8) I also
saw 20 of them, replied 20 times. I sent 0 to them, they replied 0
times. But I think it's relevant to note that I only sent LCP
Echo-Requests just before the line was disconnected (The one I
mentioned before is 00:24:42.570312 but there's another example at
00:27:52.564297) - they could very well not be replied because the ISP
stopped forwarding traffic to me, or the DSL has untrained (i mean,
while the LCPs are are being sent, I am also not seeing any IP traffic
coming in, whlie I have a ping running in the background, I should be
seeing 2 packets per second, for 40secs I see nothing and then see the
line disconnect).

> Normally if there is a period of 30 seconds with either no LCP
> keepalives or no user data, pppoe(4) will drop the session and log
> "LCP keepalive timeout", which it looks like should show up at the
> bottom of dmesg output.
Confirmed. I can see those 3 LCP packets with 10s interval, and then
my side sending Terminate-Request and immediately thereafter
Terminate.

> So, if they do normally respond to LCP, it looks like the link
> is dropping out somehow..
I don't think we can conclude if they normally respond to LCP or not,
because at it seems I am not normally sending LCP Echos .. and I don't
see how to set this in ifconfig(8) or sppp(4) or pppoe(4). Do you have
any idea why my side is not sending LCP echos, or how I can configure
it to?

> Or, if they don't normally respond to LCP, you may either need some
> other way to generate data to keep the session alive, disable the
> timeout (by hacking the driver), or ideally see if they can start
> responding (easier if it's an ISP compliant with xkcd 806 ;)
There's ingress and egress data ~always. In my examples, it's a
running ping from internet to me. The ping just times out after a
while, for up to a minute or so while my side is sending (sometimes
multiple!) Initiation frames, for example 00:28:22.583356,
00:28:27.583185, 00:29:27.581285. Then the ISP responds with an Offer
and the cycle restarts.

The only other thing I can think of is that actually my DSL modem is
dropping the connection. I should easily be able to find a pocket of
downtime and inspect the DSL line status because the LCP keepalive
timeouts occur more than 100 times per day (there can be 8 hours
between them, or as little as 3 minutes) -
http://pastebin.com/1VikG019

groet,
Pim

--
Pim van Pelt <[hidden email]>
PBVP1-RIPE - http://www.ipng.nl/

Reply | Threaded
Open this post in threaded view
|

Re: pppoe repeated disconnects

Stuart Henderson
On 2013-02-24, Pim van Pelt <[hidden email]> wrote:

>> Your log only shows the last incoming packet and a few unanswered
>> outgoing packets, so it's not enough to tell, does the ISP reply to
>> any of your earlier LCP echo requests or does it ignore all of them?
> You are right, I only pasted the tail end of the data, but I do have
> the whole file still:
> http://www.ipng.nl/pppoe-tcpdump/pppoe4.tcpdump (kernel)
> http://www.ipng.nl/pppoe-tcpdump/pppoe8.tcpdump (userland)
>
> In the call I was referring to above (pppoe4.tcpdump, the kernel
> version), a whole session lasts from 00:25:32.588726 to
> 00:28:22.583343.
> I took another look and it seems that my side does not send LCP echos,
> but it does reply to them.

Ah... this might indeed be normal then, the line I was looking at has
echoes every second sent by my ISP (http://aa.net.uk/kb-broadband-cqm.html)
so I wouldn't have noticed whether my side was sending them (I was looking
at tcpdump on the pppoe interface where it's not clear who sends them).
I suspect I was wrong then and we only send them if the line is quiet.

> You mentioned that I would/should be sending echos as well, but my
> side definitely does not do that every N seconds. It does, however,
> when the connection did not see packets for a while, for example at
> 00:24:42.570312, 00:24:52.569996, 00:25:02.569679 (none responded by
> ISP), and it then sends terminate-request at 00:25:12.569442. I
> counted the Echo-Requests, I saw 20 from them, and replied 20 times. I
> sent 6 to them, they replied 0 times. In the userland pppoe(8) I also
> saw 20 of them, replied 20 times. I sent 0 to them, they replied 0
> times. But I think it's relevant to note that I only sent LCP
> Echo-Requests just before the line was disconnected (The one I
> mentioned before is 00:24:42.570312 but there's another example at
> 00:27:52.564297) - they could very well not be replied because the ISP
> stopped forwarding traffic to me, or the DSL has untrained (i mean,
> while the LCPs are are being sent, I am also not seeing any IP traffic
> coming in, whlie I have a ping running in the background, I should be
> seeing 2 packets per second, for 40secs I see nothing and then see the
> line disconnect).

It does seem like the line could well be dropping out then.
Are you able to setup syslog on the DSL modem and get the logs sent to
the box running pppoe? Having the two logs together would make it
easier to correlate any problems logged by the modem with those logged
by pppoe. (it will need 'syslog_flags="-u"' in rc.conf.local to accept
from the network, and /etc/rc.d/syslogd restart). Maybe you could also
leave a ping running to the modem to spot if there's a problem with
the ethernet link or something?

Reply | Threaded
Open this post in threaded view
|

Re: pppoe repeated disconnects

Pim van Pelt
Hoi once more,

Closure on this.

2013/2/24 Stuart Henderson <[hidden email]>:
> On 2013-02-24, Pim van Pelt <[hidden email]> wrote:
>> .. , or the DSL has untrained
>
> It does seem like the line could well be dropping out then.
.. and that was it. I have since replaced the DSL modem with a
Fritz!Box 7390 and the line has been up for 6 days without pretty much
a single packet dropped. As a bonus, it now trains at 50/10 where it
used to be 30/6 or something significantly weaker. That is sufficient
for me to prove that (a) the OpenBSD pppoe(4) and pppoe(8) setups were
fine, and (b) DSL modem vendors make a difference :)

Thanks Stuart, for helping me think this through, and others on this
list for lurking.

groet,
Pim
--
Pim van Pelt <[hidden email]>
PBVP1-RIPE - http://www.ipng.nl/