iked random tunnel drops

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

iked random tunnel drops

Marc West
I have an iked tunnel between two sites, both 6.2 with two machines at
each site running carp and sasyncd. This normally runs flawlessly but
there have been several events where tunnel traffic randomly drops.
Sometimes everything reestablishes automatically about 5 minutes later,
but in some cases I have had to restart iked/sasyncd which brings all
flows and traffic back up immediately and stays stable again for weeks.

From what I have been able to gather so far, flows and SAs consistently
drop from "site A" but are still present on "site B" until the issue
corrects itself or iked/sasyncd are restarted at "site B". Logs look
about the same each time: routine childsa rekeys leading up to the loss
of tunnel traffic, followed by a full ikesa init a few minutes later
either on its own or from restarting. But nothing logged at the moment
traffic drops.

When this occurs there have not been any underlying connectivity issues.
0% packet loss on MTRs between the machines, normal latency, SSH
connections between their public IPs are fine, no carp events.

Any thoughts on tracking this down further would be much appreciated.
Sorry for the length but I'm including logs from two separate events
below as well as configs.

Also, is there a way to send full iked verbosity to syslog so it can be
saved with timestamps?

Many thanks.


--------------------------------------------------------
06:34 - tunnel traffic flowing fine
06:35 - connectivity to remote tunnel networks drops with 100% packet
loss. flows were initially still up on "sitea-router1".
06:36 - same
06:37 - same
06:38 - flows dropped from "sitea-router1"
06:39 - recovery after restarting iked/sasyncd on "siteb-router1"

Site A /var/log/daemon:
Jan  9 06:34:05 sitea-router1 iked[58493]: ikev2_msg_send:
CREATE_CHILD_SA request from a.a.a.a:500 to b.b.b.b:500 msgid 212, 720
bytes
Jan  9 06:34:05 sitea-router1 iked[58493]: ikev2_recv: CREATE_CHILD_SA
response from responder b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
212, 736 bytes
Jan  9 06:34:05 sitea-router1 iked[58493]: ikev2_msg_send: INFORMATIONAL
request from a.a.a.a:500 to b.b.b.b:500 msgid 213, 80 bytes
Jan  9 06:34:05 sitea-router1 iked[58493]: ikev2_recv: INFORMATIONAL
response from responder b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
213, 80 bytes
Jan  9 06:34:17 sitea-router1 iked[58493]: ikev2_msg_send:
CREATE_CHILD_SA request from a.a.a.a:500 to b.b.b.b:500 msgid 214, 720
bytes
Jan  9 06:34:17 sitea-router1 iked[58493]: ikev2_recv: CREATE_CHILD_SA
response from responder b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
214, 736 bytes
Jan  9 06:34:17 sitea-router1 iked[58493]: ikev2_msg_send: INFORMATIONAL
request from a.a.a.a:500 to b.b.b.b:500 msgid 215, 80 bytes
Jan  9 06:34:17 sitea-router1 iked[58493]: ikev2_recv: INFORMATIONAL
response from responder b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
215, 80 bytes
Jan  9 06:34:32 sitea-router1 iked[58493]: ikev2_msg_send:
CREATE_CHILD_SA request from a.a.a.a:500 to b.b.b.b:500 msgid 216, 720
bytes
Jan  9 06:34:32 sitea-router1 iked[58493]: ikev2_recv: CREATE_CHILD_SA
response from responder b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
216, 736 bytes
Jan  9 06:34:32 sitea-router1 iked[58493]: ikev2_msg_send: INFORMATIONAL
request from a.a.a.a:500 to b.b.b.b:500 msgid 217, 80 bytes
Jan  9 06:34:32 sitea-router1 iked[58493]: ikev2_recv: INFORMATIONAL
response from responder b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
217, 80 bytes
Jan  9 06:34:44 sitea-router1 iked[58493]: ikev2_msg_send:
CREATE_CHILD_SA request from a.a.a.a:500 to b.b.b.b:500 msgid 218, 720
bytes
Jan  9 06:34:44 sitea-router1 iked[58493]: ikev2_recv: CREATE_CHILD_SA
response from responder b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
218, 736 bytes
Jan  9 06:34:44 sitea-router1 iked[58493]: ikev2_msg_send: INFORMATIONAL
request from a.a.a.a:500 to b.b.b.b:500 msgid 219, 80 bytes
Jan  9 06:34:44 sitea-router1 iked[58493]: ikev2_recv: INFORMATIONAL
response from responder b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
219, 80 bytes
Jan  9 06:36:02 sitea-router1 iked[58493]: ikev2_msg_send: INFORMATIONAL
request from a.a.a.a:500 to b.b.b.b:500 msgid 220, 80 bytes
Jan  9 06:37:03 sitea-router1 iked[58493]: ikev2_msg_send: INFORMATIONAL
request from a.a.a.a:500 to b.b.b.b:500 msgid 221, 80 bytes
Jan  9 06:38:03 sitea-router1 iked[58493]: ikev2_msg_send: INFORMATIONAL
request from a.a.a.a:500 to b.b.b.b:500 msgid 222, 80 bytes
Jan  9 06:39:02 sitea-router1 iked[58493]: ikev2_msg_send: IKE_SA_INIT
request from a.a.a.a:500 to b.b.b.b:500 msgid 0, 510 bytes
Jan  9 06:39:02 sitea-router1 iked[58493]: ikev2_recv: IKE_SA_INIT
response from responder b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id 0,
446 bytes
Jan  9 06:39:02 sitea-router1 iked[58493]: ikev2_msg_send: IKE_AUTH
request from a.a.a.a:500 to b.b.b.b:500 msgid 1, 736 bytes
Jan  9 06:39:02 sitea-router1 iked[58493]: ikev2_recv: IKE_AUTH response
from responder b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id 1, 752 bytes
Jan  9 06:39:02 sitea-router1 iked[58493]: sa_state: VALID ->
ESTABLISHED from b.b.b.b:500 to a.a.a.a:500 policy 'siteb'


Site A /var/log/messages:
Jan  9 06:00:43 sitea-router1 iked[58493]: ikev2_pld_delete: deleted 1
spis
Jan  9 06:01:14 sitea-router1 last message repeated 2 times
Jan  9 06:03:28 sitea-router1 last message repeated 2 times
Jan  9 06:13:20 sitea-router1 last message repeated 22 times
Jan  9 06:23:11 sitea-router1 last message repeated 21 times
Jan  9 06:32:13 sitea-router1 last message repeated 22 times
Jan  9 06:44:14 sitea-router1 iked[58493]: ikev2_pld_delete: deleted 1
spis


Site B /var/log/daemon:
Jan  9 06:34:05 siteb-router1 iked[27432]: ikev2_recv: CREATE_CHILD_SA
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 212,
720 bytes
Jan  9 06:34:05 siteb-router1 iked[27432]: ikev2_msg_send:
CREATE_CHILD_SA response from b.b.b.b:500 to a.a.a.a:500 msgid 212, 736
bytes
Jan  9 06:34:05 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 213,
80 bytes
Jan  9 06:34:05 siteb-router1 iked[27432]: ikev2_pld_delete: deleted 1
spis
Jan  9 06:34:05 siteb-router1 iked[27432]: ikev2_msg_send: INFORMATIONAL
response from b.b.b.b:500 to a.a.a.a:500 msgid 213, 80 bytes
Jan  9 06:34:17 siteb-router1 iked[27432]: ikev2_recv: CREATE_CHILD_SA
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 214,
720 bytes
Jan  9 06:34:17 siteb-router1 iked[27432]: ikev2_msg_send:
CREATE_CHILD_SA response from b.b.b.b:500 to a.a.a.a:500 msgid 214, 736
bytes
Jan  9 06:34:17 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 215,
80 bytes
Jan  9 06:34:17 siteb-router1 iked[27432]: ikev2_pld_delete: deleted 1
spis
Jan  9 06:34:17 siteb-router1 iked[27432]: ikev2_msg_send: INFORMATIONAL
response from b.b.b.b:500 to a.a.a.a:500 msgid 215, 80 bytes
Jan  9 06:34:32 siteb-router1 iked[27432]: ikev2_recv: CREATE_CHILD_SA
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 216,
720 bytes
Jan  9 06:34:32 siteb-router1 iked[27432]: ikev2_msg_send:
CREATE_CHILD_SA response from b.b.b.b:500 to a.a.a.a:500 msgid 216, 736
bytes
Jan  9 06:34:32 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 217,
80 bytes
Jan  9 06:34:32 siteb-router1 iked[27432]: ikev2_pld_delete: deleted 1
spis
Jan  9 06:34:32 siteb-router1 iked[27432]: ikev2_msg_send: INFORMATIONAL
response from b.b.b.b:500 to a.a.a.a:500 msgid 217, 80 bytes
Jan  9 06:34:44 siteb-router1 iked[27432]: ikev2_recv: CREATE_CHILD_SA
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 218,
720 bytes
Jan  9 06:34:44 siteb-router1 iked[27432]: ikev2_msg_send:
CREATE_CHILD_SA response from b.b.b.b:500 to a.a.a.a:500 msgid 218, 736
bytes
Jan  9 06:34:44 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 219,
80 bytes
Jan  9 06:34:44 siteb-router1 iked[27432]: ikev2_pld_delete: deleted 1
spis
Jan  9 06:34:44 siteb-router1 iked[27432]: ikev2_msg_send: INFORMATIONAL
response from b.b.b.b:500 to a.a.a.a:500 msgid 219, 80 bytes
Jan  9 06:36:02 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 220,
80 bytes
Jan  9 06:36:05 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 220,
80 bytes
Jan  9 06:36:09 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 220,
80 bytes
Jan  9 06:36:17 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 220,
80 bytes
Jan  9 06:36:33 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 220,
80 bytes
Jan  9 06:37:03 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 221,
80 bytes
Jan  9 06:37:05 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 221,
80 bytes
Jan  9 06:37:05 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 220,
80 bytes
Jan  9 06:37:09 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 221,
80 bytes
Jan  9 06:37:17 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 221,
80 bytes
Jan  9 06:37:33 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 221,
80 bytes
Jan  9 06:38:03 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 222,
80 bytes
Jan  9 06:38:05 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 222,
80 bytes
Jan  9 06:38:05 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 221,
80 bytes
Jan  9 06:38:09 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 222,
80 bytes
Jan  9 06:39:02 siteb-router1 iked[27432]: ikev2_recv: IKE_SA_INIT
request from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 0,
510 bytes
Jan  9 06:39:02 siteb-router1 iked[27432]: ikev2_msg_send: IKE_SA_INIT
response from b.b.b.b:500 to a.a.a.a:500 msgid 0, 446 bytes
Jan  9 06:39:02 siteb-router1 iked[27432]: ikev2_recv: IKE_AUTH request
from initiator a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 1, 736 bytes
Jan  9 06:39:02 siteb-router1 iked[27432]: ikev2_msg_send: IKE_AUTH
response from b.b.b.b:500 to a.a.a.a:500 msgid 1, 752 bytes
Jan  9 06:39:02 siteb-router1 iked[27432]: sa_state: VALID ->
ESTABLISHED from a.a.a.a:500 to b.b.b.b:500 policy 'sitea'


Site B /var/log/messages:
Jan  9 06:00:04 siteb-router1 iked[27432]: ikev2_pld_delete: deleted 1
spis
Jan  9 06:00:30 siteb-router1 last message repeated 2 times
Jan  9 06:03:03 siteb-router1 last message repeated 8 times
Jan  9 06:12:54 siteb-router1 last message repeated 20 times
Jan  9 06:22:47 siteb-router1 last message repeated 23 times
Jan  9 06:32:53 siteb-router1 last message repeated 19 times
Jan  9 06:34:44 siteb-router1 last message repeated 8 times
Jan  9 06:52:53 siteb-router1 iked[27432]: ikev2_pld_delete: deleted 1
spis

--------------------------------------------------------

Another occurrence from yesterday. Connectivity was lost at 14:15, flows
and SAs dropped from "sitea-router1" but were still present on
"siteb-router1". This one looks a little different with site A sending
IKE_SA_INIT the minute *before* the drop (ikelifetime is the default
zero). Flows/traffic recovered at 14:26 after iked on "siteb-router1"
was restarted:

"Site A" /var/log/daemon:
Jan 17 13:51:56 sitea-router1 iked[58493]: ikev2_recv: CREATE_CHILD_SA
request from initiator b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
2682, 752 bytes
Jan 17 13:51:56 sitea-router1 iked[58493]: ikev2_msg_send:
CREATE_CHILD_SA response from a.a.a.a:500 to b.b.b.b:500 msgid 2682, 704
bytes
Jan 17 13:51:56 sitea-router1 iked[58493]: ikev2_recv: INFORMATIONAL
request from initiator b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
2683, 80 bytes
Jan 17 13:51:56 sitea-router1 iked[58493]: ikev2_pld_delete: deleted 1
spis
Jan 17 13:51:56 sitea-router1 iked[58493]: ikev2_msg_send: INFORMATIONAL
response from a.a.a.a:500 to b.b.b.b:500 msgid 2683, 80 bytes
Jan 17 14:11:29 sitea-router1 iked[58493]: ikev2_msg_send:
CREATE_CHILD_SA request from a.a.a.a:500 to b.b.b.b:500 msgid 2602, 720
bytes
Jan 17 14:11:29 sitea-router1 iked[58493]: ikev2_recv: CREATE_CHILD_SA
response from initiator b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id
2602, 736 bytes
Jan 17 14:11:29 sitea-router1 iked[58493]: ikev2_msg_send: INFORMATIONAL
request from a.a.a.a:500 to b.b.b.b:500 msgid 2603, 80 bytes
Jan 17 14:14:28 sitea-router1 iked[58493]: ikev2_msg_send: IKE_SA_INIT
request from a.a.a.a:500 to b.b.b.b:500 msgid 0, 510 bytes
Jan 17 14:17:28 sitea-router1 iked[58493]: ikev2_msg_send: IKE_SA_INIT
request from a.a.a.a:500 to b.b.b.b:500 msgid 0, 510 bytes
Jan 17 14:20:28 sitea-router1 iked[58493]: ikev2_msg_send: IKE_SA_INIT
request from a.a.a.a:500 to b.b.b.b:500 msgid 0, 510 bytes
Jan 17 14:23:28 sitea-router1 iked[58493]: ikev2_msg_send: IKE_SA_INIT
request from a.a.a.a:500 to b.b.b.b:500 msgid 0, 510 bytes
Jan 17 14:26:15 sitea-router1 iked[58493]: ikev2_recv: IKE_SA_INIT
request from initiator b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id 0,
510 bytes
Jan 17 14:26:15 sitea-router1 iked[58493]: ikev2_msg_send: IKE_SA_INIT
response from a.a.a.a:500 to b.b.b.b:500 msgid 0, 446 bytes
Jan 17 14:26:15 sitea-router1 iked[58493]: ikev2_recv: IKE_AUTH request
from initiator b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id 1, 768 bytes
Jan 17 14:26:15 sitea-router1 iked[58493]: ikev2_msg_send: IKE_AUTH
response from a.a.a.a:500 to b.b.b.b:500 msgid 1, 720 bytes
Jan 17 14:26:15 sitea-router1 iked[58493]: sa_state: VALID ->
ESTABLISHED from b.b.b.b:500 to a.a.a.a:500 policy 'siteb'


Site A /var/log/messages:
Jan 17 13:51:56 sitea-router1 iked[58493]: ikev2_pld_delete: deleted 1
spis
Jan 17 14:35:51 sitea-router1 iked[58493]: ikev2_pld_delete: deleted 1
spis


Site B /var/log/daemon:
Jan 17 13:51:56 siteb-router1 iked[27432]: ikev2_msg_send:
CREATE_CHILD_SA request from b.b.b.b:500 to a.a.a.a:500 msgid 2682, 752
bytes
Jan 17 13:51:56 siteb-router1 iked[27432]: ikev2_recv: CREATE_CHILD_SA
response from responder a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id
2682, 704 bytes
Jan 17 13:51:56 siteb-router1 iked[27432]: ikev2_msg_send: INFORMATIONAL
request from b.b.b.b:500 to a.a.a.a:500 msgid 2683, 80 bytes
Jan 17 13:51:57 siteb-router1 iked[27432]: ikev2_recv: INFORMATIONAL
response from responder a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id
2683, 80 bytes
Jan 17 14:11:29 siteb-router1 iked[27432]: ikev2_recv: CREATE_CHILD_SA
request from responder a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id
2602, 720 bytes
Jan 17 14:11:29 siteb-router1 iked[27432]: ikev2_msg_send:
CREATE_CHILD_SA response from b.b.b.b:500 to a.a.a.a:500 msgid 2602, 736
bytes
Jan 17 14:16:22 siteb-router1 iked[68324]: ca exiting, pid 68324
Jan 17 14:16:22 siteb-router1 iked[82558]: control exiting, pid 82558
<note below about restarting iked>
Jan 17 14:26:13 siteb-router1 sasyncd[29654]: shutting down...
Jan 17 14:26:13 siteb-router1 sasyncd[29654]: net_disconnect_peer: peer
"x.x.x.x" removed
Jan 17 14:26:13 siteb-router1 sasyncd[16090]: monitor_loop: priv process
exiting abnormally
Jan 17 14:26:13 siteb-router1 sasyncd[22490]: carp_init: initializing
runstate to MASTER
Jan 17 14:26:13 siteb-router1 sasyncd[22490]: net_connect: peer
"x.x.x.x" connected, fd 6
Jan 17 14:26:13 siteb-router1 sasyncd[74064]: net_ctl: peer "x.x.x.x"
state change to SLAVE
Jan 17 14:26:15 siteb-router1 iked[6934]: ikev2_msg_send: IKE_SA_INIT
request from b.b.b.b:500 to a.a.a.a:500 msgid 0, 510 bytes
Jan 17 14:26:15 siteb-router1 iked[6934]: ikev2_recv: IKE_SA_INIT
response from responder a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 0,
446 bytes
Jan 17 14:26:15 siteb-router1 iked[6934]: ikev2_msg_send: IKE_AUTH
request from b.b.b.b:500 to a.a.a.a:500 msgid 1, 768 bytes
Jan 17 14:26:15 siteb-router1 iked[6934]: ikev2_recv: IKE_AUTH response
from responder a.a.a.a:500 to b.b.b.b:500 policy 'sitea' id 1, 720 bytes
Jan 17 14:26:15 siteb-router1 iked[6934]: sa_state: VALID -> ESTABLISHED
from a.a.a.a:500 to b.b.b.b:500 policy 'sitea'


Site B /var/log/messages:
Jan 17 13:35:42 siteb-router1 iked[27432]: ikev2_pld_delete: deleted 1
spis
Jan 17 14:00:01 siteb-router1 syslogd[63819]: restart
Jan 17 14:26:13 siteb-router1 sasyncd[29654]: shutting down...
Jan 17 14:26:13 siteb-router1 sasyncd[29654]: net_disconnect_peer: peer
"x.x.x.x" removed
Jan 17 14:26:13 siteb-router1 sasyncd[16090]: monitor_loop: priv process
exiting abnormally
Jan 17 14:26:13 siteb-router1 sasyncd[22490]: carp_init: initializing
runstate to MASTER
Jan 17 14:26:13 siteb-router1 sasyncd[22490]: net_connect: peer
"x.x.x.x" connected, fd 6
Jan 17 14:26:13 siteb-router1 sasyncd[74064]: net_ctl: peer "x.x.x.x"
state change to SLAVE


This time restarting iked on "siteb-router1" failed, the control and ca
processes shut down ok but parent/ikev2 processes had to be kill -9.
This usually restarts cleanly.

# rcctl restart iked && rcctl restart sasyncd
iked
(failed)
# ps auxwww | grep iked
root     46911  0.0  0.0  1280  3200 ??  Isp   21Dec17    0:00.02 iked:
parent (iked)
_iked    27432  0.0  0.0  1108  3276 ??  Ip    21Dec17    6:53.97 iked:
ikev2 (iked)
root     96323  0.0  0.0   144   356 p2  R+/1   2:24PM    0:00.00 grep
iked
# rcctl stop iked
iked(failed)
# ps auxwww | grep iked
root     46911  0.0  0.0  1280  3200 ??  Isp   21Dec17    0:00.02 iked:
parent (iked)
_iked    27432  0.0  0.0  1108  3276 ??  Ip    21Dec17    6:53.97 iked:
ikev2 (iked)
root     70420  0.0  0.0   140   340 p2  R+/2   2:25PM    0:00.00 grep
iked
# kill -9 46911
# ps auxwww | grep iked
_iked    27432  0.0  0.0  1108  3276 ??  Ip    21Dec17    6:53.97 iked:
ikev2 (iked)
root     64784  0.0  0.0   144   380 p2  R+/1   2:25PM    0:00.00 grep
iked
# kill 27432
# ps auxwww | grep iked
_iked    27432  0.0  0.0  1108  3276 ??  Sp    21Dec17    6:53.97 iked:
ikev2 (iked)
root     15327  0.0  0.0   212   372 p2  R+/2   2:25PM    0:00.00 grep
iked
# kill -9 27432
# rcctl start iked
iked(ok)
#
--------------------------------------------------------

iked_flags="-S -vvv"
ipsec=YES
sasyncd_flags="-v"

Site A iked.conf:
ikev2 "siteb" active esp \
        from 10.0.50.0/24 to 10.0.10.128/27 \
        from 10.0.50.0/24 to 10.3.0.0/24 \
        from 10.0.50.0/24 to 10.10.0.0/24 \
        from 10.0.50.0/24 to 10.47.0.0/24 \
        from 10.0.50.0/24 to 10.24.4.0/24 \
        from 10.0.50.0/24 to 192.168.160.0/24 \
        from 10.0.50.0/24 to 192.168.180.0/24 \
        from 10.70.0.0/20 to 192.168.160.0/24 \
        from 10.70.32.0/20 to 192.168.160.0/24 \
        from 10.70.64.0/20 to 192.168.160.0/24 \
        from 10.24.8.0/24 to 192.168.160.0/24 \
        from 172.16.0.0/12 to 10.24.4.0/24 \
        from 172.16.0.0/12 to 192.168.160.0/24 \
        from 172.16.0.0/12 to 192.168.180.0/24 \
        from 192.168.70.0/24 to 10.24.4.0/24 \
        from 192.168.70.0/24 to 192.168.160.0/24 \
        from 192.168.102.0/24 to 192.168.160.0/24 \
        local a.a.a.a peer b.b.b.b \
        childsa enc aes-256-gcm \
        srcid a.a.a.a dstid b.b.b.b \
        psk thekey

Site B iked.conf:
ikev2 "sitea" active esp \
        from 10.0.10.128/27 to 10.0.50.0/24 \
        from 10.3.0.0/24 to 10.0.50.0/24 \
        from 10.10.0.0/24 to 10.0.50.0/24 \
        from 10.47.0.0/24 to 10.0.50.0/24 \
        from 10.24.4.0/24 to 10.0.50.0/24 \
        from 10.24.4.0/24 to 172.16.0.0/12 \
        from 10.24.4.0/24 to 192.168.70.0/24 \
        from 192.168.160.0/24 to 10.0.50.0/24 \
        from 192.168.160.0/24 to 10.70.0.0/20 \
        from 192.168.160.0/24 to 10.70.32.0/20 \
        from 192.168.160.0/24 to 10.70.64.0/20 \
        from 192.168.160.0/24 to 10.24.8.0/24 \
        from 192.168.160.0/24 to 172.16.0.0/12 \
        from 192.168.160.0/24 to 192.168.70.0/24 \
        from 192.168.160.0/24 to 192.168.80.0/24 \
        from 192.168.160.0/24 to 192.168.102.0/24 \
        from 192.168.180.0/24 to 10.0.50.0/24 \
        from 192.168.180.0/24 to 172.16.0.0/12 \
        local b.b.b.b peer a.a.a.a \
        childsa enc aes-256-gcm \
        srcid b.b.b.b dstid a.a.a.a \
        psk thekey

active on both sides is maybe redundant but it doesn't cause issues
on a different tunnel, granted that is 6.2<->6.1 instead of 6.2<->6.2.

Reply | Threaded
Open this post in threaded view
|

Re: iked random tunnel drops

Claudio Jeker
On Thu, Jan 18, 2018 at 04:57:14PM -0600, Marc West wrote:

> I have an iked tunnel between two sites, both 6.2 with two machines at
> each site running carp and sasyncd. This normally runs flawlessly but
> there have been several events where tunnel traffic randomly drops.
> Sometimes everything reestablishes automatically about 5 minutes later,
> but in some cases I have had to restart iked/sasyncd which brings all
> flows and traffic back up immediately and stays stable again for weeks.
>
> From what I have been able to gather so far, flows and SAs consistently
> drop from "site A" but are still present on "site B" until the issue
> corrects itself or iked/sasyncd are restarted at "site B". Logs look
> about the same each time: routine childsa rekeys leading up to the loss
> of tunnel traffic, followed by a full ikesa init a few minutes later
> either on its own or from restarting. But nothing logged at the moment
> traffic drops.
>
> When this occurs there have not been any underlying connectivity issues.
> 0% packet loss on MTRs between the machines, normal latency, SSH
> connections between their public IPs are fine, no carp events.
>
> Any thoughts on tracking this down further would be much appreciated.
> Sorry for the length but I'm including logs from two separate events
> below as well as configs.
>
> Also, is there a way to send full iked verbosity to syslog so it can be
> saved with timestamps?
>

I had similar troubles and found out that my pf ruleset on one of the
boxes was too strict and only let IPSec traffic out but not in. So when
the bidirectional state expired half the session died and touching any
iked normally fixed it. So I would double check that.

--
:wq Claudio

Reply | Threaded
Open this post in threaded view
|

Re: iked random tunnel drops

Marc West
On 2018-01-18 23:06:31, Claudio Jeker wrote:

> On Thu, Jan 18, 2018 at 04:57:14PM -0600, Marc West wrote:
> > I have an iked tunnel between two sites, both 6.2 with two machines at
> > each site running carp and sasyncd. This normally runs flawlessly but
> > there have been several events where tunnel traffic randomly drops.
> > Sometimes everything reestablishes automatically about 5 minutes later,
> > but in some cases I have had to restart iked/sasyncd which brings all
> > flows and traffic back up immediately and stays stable again for weeks.
> >
> > From what I have been able to gather so far, flows and SAs consistently
> > drop from "site A" but are still present on "site B" until the issue
> > corrects itself or iked/sasyncd are restarted at "site B". Logs look
> > about the same each time: routine childsa rekeys leading up to the loss
> > of tunnel traffic, followed by a full ikesa init a few minutes later
> > either on its own or from restarting. But nothing logged at the moment
> > traffic drops.
> >
> > When this occurs there have not been any underlying connectivity issues.
> > 0% packet loss on MTRs between the machines, normal latency, SSH
> > connections between their public IPs are fine, no carp events.
> >
> > Any thoughts on tracking this down further would be much appreciated.
> > Sorry for the length but I'm including logs from two separate events
> > below as well as configs.
> >
> > Also, is there a way to send full iked verbosity to syslog so it can be
> > saved with timestamps?
> >
>
> I had similar troubles and found out that my pf ruleset on one of the
> boxes was too strict and only let IPSec traffic out but not in. So when
> the bidirectional state expired half the session died and touching any
> iked normally fixed it. So I would double check that.

Thanks for the suggestion. I triple checked the pf config on both boxes
and all traffic from/to the iked peer IP is permitted. Anything else
come to mind to check?

Reply | Threaded
Open this post in threaded view
|

Re: iked random tunnel drops

danial
I had something similar happening. In my case I solved it by disabling NAT-T
on one end.

/ Danial



--
Sent from: http://openbsd-archive.7691.n7.nabble.com/openbsd-user-misc-f3.html

/ Danial
Reply | Threaded
Open this post in threaded view
|

Re: iked random tunnel drops

Marc West
On 2018-01-23 08:26:43, danial wrote:
> I had something similar happening. In my case I solved it by disabling NAT-T
> on one end.

Unfortunately disabling NAT-T did not have any effect, the issue has
occurred 2 additional times since restarting with -T. There is no NAT or
firewall between these machines aside from PF with everything permitted
to/from peer IPs (confirmed with pflog that nothing legit is getting
blocked). Thanks though.

I was able to get iked running in debug mode outputting with timestamps
to a file. I'll try to summarize and if anyone is willing to look at
the full debug logs I'm happy to send, but it is a lot.

There seems to be two slightly different symptoms, one where iked does
not gracefully restart and has to be kill -9'd, and another where
everything recovers on its own after an ikesa rekey. In both cases an
ikev2_msg_retransmit_timeout is eventually logged for one msgid that is
not received on the other side, even though the underlying connectivity
between the machines is OK. Sometimes tunnel traffic drops minutes
before that retransmit_timeout and flow removal, other times tunnel
traffic passes until the flows are freed.

Example 1 - "Site A" initiates a rekey to "Site B":

Feb 15 23:28:48 pfkey_process: SA 0xb241ab32 is expired, pending rekeying
Feb 15 23:28:48 ikev2_send_create_child_sa: rekeying ESP spi 0xb241ab32
Feb 15 23:28:48 ikev2_pld_parse: header ispi 0x922bc52299efed0e rspi 0x81f6d03c457134b7 nextpayload SK version 0x20 exchange CREATE_CHILD_SA flags 0x00 msgid 80 length 720 response 0
Feb 15 23:28:48 ikev2_msg_send: CREATE_CHILD_SA request from a.a.a.a:500 to b.b.b.b:500 msgid 80, 720 bytes
Feb 15 23:28:48 ikev2_recv: CREATE_CHILD_SA response from initiator b.b.b.b:500 to a.a.a.a:500 policy 'siteb' id 80, 736 bytes
Feb 15 23:28:48 ikev2_recv: ispi 0x922bc52299efed0e rspi 0x81f6d03c457134b7
Feb 15 23:28:48 ikev2_recv: updated SA to peer b.b.b.b:500 local a.a.a.a:500
[...]
Feb 15 23:28:48 ikev2_msg_send: INFORMATIONAL request from a.a.a.a:500 to b.b.b.b:500 msgid 81, 80 bytes
[...]
Feb 15 23:30:55 ikev2_msg_retransmit_timeout: retransmit limit reached for msgid 81
Feb 15 23:30:55 sa_free: ispi 0x922bc52299efed0e rspi 0x81f6d03c457134b7
Feb 15 23:30:55 config_free_proposals: free 0x13b9a0591300
Feb 15 23:30:55 config_free_proposals: free 0x13b9950e9180
Feb 15 23:30:55 config_free_childsas: free 0x13b9a98b3e00
Feb 15 23:30:55 config_free_childsas: free 0x13b9798f7000
Feb 15 23:30:55 sa_free_flows: free 0x13b9afcdec00
[... all flows removed, traffic drops ...]

iked on the other side ("Site B") receives and responds to the
CREATE_CHILD_SA request (msgid 80) but the INFORMATIONAL request msgid
81 above is not seen on the other side. A ctrl-c of iked running in the
foreground stopped the control and ca processes, but parent and ikev2
processes hung and both needed a kill -9 like before. I attached ktrace
to the parent and ikev2 PIDs but no operations were logged on either. Is
there something else I should do in this scenario to debug the hung
processes?

--------------------------------------------------------------

Example 2 - This one eventually recovered on its own with a full ikesa
rekey. Does the "pfkey_process is busy" point to anything? This is only
a very small portion of the debug log, but tunnel traffic stopped
passing at at 05:44 which is minutes before the retransmit timeout and
the flows were actually removed.

Site A:
Feb 19 05:44:51 pfkey_process: SA 0x1f00e21b is expired, pending rekeying
Feb 19 05:45:19 pfkey_process: SA 0x3382f769 is expired, pending rekeying
Feb 19 05:45:35 pfkey_process: SA 0x7b59c4c8 is expired, pending rekeying
Feb 19 05:45:49 pfkey_process: SA 0x985fd6fa is expired, pending rekeying
Feb 19 05:46:03 pfkey_process: SA 0xbc92b3b5 is expired, pending rekeying
Feb 19 05:46:21 ikev2_msg_send: INFORMATIONAL request from a.a.a.a:500 to b.b.b.b:500 msgid 649, 80 bytes
[...]
Feb 19 05:47:49 ikev2_msg_send: INFORMATIONAL request from a.a.a.a:500 to b.b.b.b:500 msgid 650, 80 bytes
[...]
Feb 19 05:49:55 ikev2_msg_retransmit_timeout: retransmit limit reached for msgid 650

Site B:
Again, msgid 649 is received and responded to but not the very next
msgid 650:

Feb 19 05:46:21 ikev2_pld_parse: header ispi 0x43855e1590d3e29a rspi 0x8a04b23198062451 nextpayload SK version 0x20 exchange INFORMATIONAL flags 0x08 msgid 649 length 80 response 0
Feb 19 05:46:21 ikev2_pld_parse: header ispi 0x43855e1590d3e29a rspi 0x8a04b23198062451 nextpayload SK version 0x20 exchange INFORMATIONAL flags 0x20 msgid 649 length 80 response 1
Feb 19 05:46:21 ikev2_msg_send: INFORMATIONAL response from b.b.b.b:500 to a.a.a.a:500 msgid 649, 80 bytes

# grep "Feb 19" iked_debug.log | grep "msgid 650"
#
# grep "Feb 19" iked_debug.log | grep -B1 "is busy"
Feb 19 05:44:50 pfkey_process: SA 0x1f00e21b is expired, pending rekeying
Feb 19 05:44:50 pfkey_dispatch: pfkey_process is busy, retry later
--
Feb 19 05:45:19 pfkey_process: SA 0x3382f769 is expired, pending rekeying
Feb 19 05:45:19 pfkey_dispatch: pfkey_process is busy, retry later
--
Feb 19 05:45:34 pfkey_process: SA 0x7b59c4c8 is expired, pending rekeying
Feb 19 05:45:34 pfkey_dispatch: pfkey_process is busy, retry later
--
Feb 19 05:45:48 pfkey_process: SA 0x985fd6fa is expired, pending rekeying
Feb 19 05:45:48 pfkey_dispatch: pfkey_process is busy, retry later
--
Feb 19 05:46:02 pfkey_process: SA 0xbc92b3b5 is expired, pending rekeying
Feb 19 05:46:02 pfkey_dispatch: pfkey_process is busy, retry later
#