firefox (or some rthread / network stuff) broken in -current

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

firefox (or some rthread / network stuff) broken in -current

Matthieu Herrb-7
Hi,

I ugraded my laptop from sources to -current yesterday. Since then
firefox stops resolving host names after a dozen of minutes or so.
The network.dnsCacheExpiration=0 setting mentionned by phessler on
mastodon doesn't help.

It's not the firefox port itself, since it started after I rebooted a
the new kernel, while base and xenocara were rebuilding. My previous
kernel was from jan, 31.

Other programs are fine (including chromium), but appart may be chromium
nothing I run is using threads and name resolution at the same time .

Any idea of what's causing this, or should I start bisecting ?

--
Matthieu Herrb

Reply | Threaded
Open this post in threaded view
|

Re: firefox (or some rthread / network stuff) broken in -current

Peter N. M. Hansteen-3
n 02/11/18 12:37, Matthieu Herrb wrote:
> I ugraded my laptop from sources to -current yesterday. Since then
> firefox stops resolving host names after a dozen of minutes or so.
> The network.dnsCacheExpiration=0 setting mentionned by phessler on
> mastodon doesn't help.
>
> It's not the firefox port itself, since it started after I rebooted a
> the new kernel, while base and xenocara were rebuilding. My previous
> kernel was from jan, 31.

I'm seeing the same thing here, but I jump snapshot to snapshot on my
laptop and I saw this change after upgrading to the February 10 snapshot

> Other programs are fine (including chromium), but appart may be chromium
> nothing I run is using threads and name resolution at the same time .

For some reason the most clearly affected ones are firefox and
thunderbird, but even name resolution from the shell and things like
pkg_add seem to resolve names slower than they used to.

- Peter
--
Peter N. M. Hansteen, member of the first RFC 1149 implementation team
http://bsdly.blogspot.com/ http://www.bsdly.net/ http://www.nuug.no/
"Remember to set the evil bit on all malicious network traffic"
delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.

Reply | Threaded
Open this post in threaded view
|

Re: firefox (or some rthread / network stuff) broken in -current

Otto Moerbeek
In reply to this post by Matthieu Herrb-7
On Sun, Feb 11, 2018 at 12:37:09PM +0100, Matthieu Herrb wrote:

> Hi,
>
> I ugraded my laptop from sources to -current yesterday. Since then
> firefox stops resolving host names after a dozen of minutes or so.
> The network.dnsCacheExpiration=0 setting mentionned by phessler on
> mastodon doesn't help.
>
> It's not the firefox port itself, since it started after I rebooted a
> the new kernel, while base and xenocara were rebuilding. My previous
> kernel was from jan, 31.
>
> Other programs are fine (including chromium), but appart may be chromium
> nothing I run is using threads and name resolution at the same time .
>
> Any idea of what's causing this, or should I start bisecting ?
>
> --
> Matthieu Herrb

There's been a commit to libc/asr, but afaik firefox does it's own resolving.
Plus what you describe suggests a kernel change.

        -Otto

Reply | Threaded
Open this post in threaded view
|

Re: firefox (or some rthread / network stuff) broken in -current

Mark Kettenis
In reply to this post by Peter N. M. Hansteen-3
> From: "Peter N. M. Hansteen" <[hidden email]>
> Date: Sun, 11 Feb 2018 12:49:49 +0100
>
> n 02/11/18 12:37, Matthieu Herrb wrote:
> > I ugraded my laptop from sources to -current yesterday. Since then
> > firefox stops resolving host names after a dozen of minutes or so.
> > The network.dnsCacheExpiration=0 setting mentionned by phessler on
> > mastodon doesn't help.
> >
> > It's not the firefox port itself, since it started after I rebooted a
> > the new kernel, while base and xenocara were rebuilding. My previous
> > kernel was from jan, 31.
>
> I'm seeing the same thing here, but I jump snapshot to snapshot on my
> laptop and I saw this change after upgrading to the February 10 snapshot
>
> > Other programs are fine (including chromium), but appart may be chromium
> > nothing I run is using threads and name resolution at the same time .
>
> For some reason the most clearly affected ones are firefox and
> thunderbird, but even name resolution from the shell and things like
> pkg_add seem to resolve names slower than they used to.

Does reverting the libc asr changes help?

Reply | Threaded
Open this post in threaded view
|

Re: firefox (or some rthread / network stuff) broken in -current

Matthieu Herrb-7
On Sun, Feb 11, 2018 at 01:09:36PM +0100, Mark Kettenis wrote:

> > From: "Peter N. M. Hansteen" <[hidden email]>
> > Date: Sun, 11 Feb 2018 12:49:49 +0100
> >
> > n 02/11/18 12:37, Matthieu Herrb wrote:
> > > I ugraded my laptop from sources to -current yesterday. Since then
> > > firefox stops resolving host names after a dozen of minutes or so.
> > > The network.dnsCacheExpiration=0 setting mentionned by phessler on
> > > mastodon doesn't help.
> > >
> > > It's not the firefox port itself, since it started after I rebooted a
> > > the new kernel, while base and xenocara were rebuilding. My previous
> > > kernel was from jan, 31.
> >
> > I'm seeing the same thing here, but I jump snapshot to snapshot on my
> > laptop and I saw this change after upgrading to the February 10 snapshot
> >
> > > Other programs are fine (including chromium), but appart may be chromium
> > > nothing I run is using threads and name resolution at the same time .
> >
> > For some reason the most clearly affected ones are firefox and
> > thunderbird, but even name resolution from the shell and things like
> > pkg_add seem to resolve names slower than they used to.
>
> Does reverting the libc asr changes help?

Unfortunatly, no.

--
Matthieu Herrb

Reply | Threaded
Open this post in threaded view
|

Re: firefox (or some rthread / network stuff) broken in -current

Martin Pieuchot
In reply to this post by Matthieu Herrb-7
On 11/02/18(Sun) 12:37, Matthieu Herrb wrote:
> Hi,
>
> I ugraded my laptop from sources to -current yesterday. Since then
> firefox stops resolving host names after a dozen of minutes or so.

What do you mean with "stops resolving host names"?  What happens?  What
do you see?

How did figure out it was a name resolution problem?

A firefox error page?
 
You saw some  syscalls failing via ktrace?

You looked at tcpdump outputs?

Some network counters were increasing?

> It's not the firefox port itself, since it started after I rebooted a
> the new kernel, while base and xenocara were rebuilding. My previous
> kernel was from jan, 31.

So you're saying that *some* kernels newer than jan, 31 expose this
regression?  Could you bisect when it has been introduced?

> Other programs are fine (including chromium), but appart may be chromium
> nothing I run is using threads and name resolution at the same time .

You're assuming it's related to threaded applications,  why?

> Any idea of what's causing this, or should I start bisecting ?

A bisection would be great.

Reply | Threaded
Open this post in threaded view
|

Re: firefox (or some rthread / network stuff) broken in -current

Matthieu Herrb-7
On Sun, Feb 11, 2018 at 02:50:30PM +0100, Martin Pieuchot wrote:

> On 11/02/18(Sun) 12:37, Matthieu Herrb wrote:
> > Hi,
> >
> > I ugraded my laptop from sources to -current yesterday. Since then
> > firefox stops resolving host names after a dozen of minutes or so.
>
> What do you mean with "stops resolving host names"?  What happens?  What
> do you see?
>
> How did figure out it was a name resolution problem?
>
> A firefox error page?

Firefox says 'Resolving herrb.net' in popup  message area at the
bottom of the windows, plays the little animation with a dot moving
left to right and back in the tab, and the tab stays blank.

>  
> You saw some  syscalls failing via ktrace?

Yes I say read on fd 9 failing with egain. this is a unix socket but
I don't know how to figure out the path with fstat.
>
> You looked at tcpdump outputs?

No visible port 53 activity while this happens
>
> Some network counters were increasing?

I haven't looked at other data.

>
> > It's not the firefox port itself, since it started after I rebooted a
> > the new kernel, while base and xenocara were rebuilding. My previous
> > kernel was from jan, 31.
>
> So you're saying that *some* kernels newer than jan, 31 expose this
> regression?  Could you bisect when it has been introduced?

>
> > Other programs are fine (including chromium), but appart may be chromium
> > nothing I run is using threads and name resolution at the same time .
>
> You're assuming it's related to threaded applications,  why?

Because ssh, ftp or dig do resolve names without issues.
>
> > Any idea of what's causing this, or should I start bisecting ?
>
> A bisection would be great.

That's what I'm doing now. Before starting I was looking for obvious
changes during the a2k18 that I may try to remove before doing the
bissect, since the problem takes some time to manifest.

Kettenis suggested the libc/asr commit, it's not this since I could
reproduce the problem without this commit.

I'm currently on a kernel that seems ok:
https://github.com/openbsd/src/commit/54fc14edb8e865a7c7cea20937dce12585b31555

--
Matthieu Herrb

Reply | Threaded
Open this post in threaded view
|

Re: firefox (or some rthread / network stuff) broken in -current

Landry Breuil-5
On Sun, Feb 11, 2018 at 03:13:07PM +0100, Matthieu Herrb wrote:

> On Sun, Feb 11, 2018 at 02:50:30PM +0100, Martin Pieuchot wrote:
> > On 11/02/18(Sun) 12:37, Matthieu Herrb wrote:
> > > Hi,
> > >
> > > I ugraded my laptop from sources to -current yesterday. Since then
> > > firefox stops resolving host names after a dozen of minutes or so.
> >
> > What do you mean with "stops resolving host names"?  What happens?  What
> > do you see?
> >
> > How did figure out it was a name resolution problem?
> >
> > A firefox error page?
>
> Firefox says 'Resolving herrb.net' in popup  message area at the
> bottom of the windows, plays the little animation with a dot moving
> left to right and back in the tab, and the tab stays blank.

Im seeing the same thing right now, and it is 'by periods'. Are you
using wifi or wired ? Here over iwm, and a kernel from yesterday.

Reply | Threaded
Open this post in threaded view
|

Re: firefox (or some rthread / network stuff) broken in -current

Matthieu Herrb-7
On Sun, Feb 11, 2018 at 04:51:47PM +0100, Landry Breuil wrote:

> On Sun, Feb 11, 2018 at 03:13:07PM +0100, Matthieu Herrb wrote:
> > On Sun, Feb 11, 2018 at 02:50:30PM +0100, Martin Pieuchot wrote:
> > > On 11/02/18(Sun) 12:37, Matthieu Herrb wrote:
> > > > Hi,
> > > >
> > > > I ugraded my laptop from sources to -current yesterday. Since then
> > > > firefox stops resolving host names after a dozen of minutes or so.
> > >
> > > What do you mean with "stops resolving host names"?  What happens?  What
> > > do you see?
> > >
> > > How did figure out it was a name resolution problem?
> > >
> > > A firefox error page?
> >
> > Firefox says 'Resolving herrb.net' in popup  message area at the
> > bottom of the windows, plays the little animation with a dot moving
> > left to right and back in the tab, and the tab stays blank.
>
> Im seeing the same thing right now, and it is 'by periods'. Are you
> using wifi or wired ? Here over iwm, and a kernel from yesterday.

wifi over iwm too.
--
Matthieu Herrb

Reply | Threaded
Open this post in threaded view
|

Re: firefox (or some rthread / network stuff) broken in -current

Matthieu Herrb-7
In reply to this post by Martin Pieuchot
On Sun, Feb 11, 2018 at 02:50:30PM +0100, Martin Pieuchot wrote:

> On 11/02/18(Sun) 12:37, Matthieu Herrb wrote:
> > Hi,
> >
> > I ugraded my laptop from sources to -current yesterday. Since then
> > firefox stops resolving host names after a dozen of minutes or so.
>
> What do you mean with "stops resolving host names"?  What happens?  What
> do you see?
>
> How did figure out it was a name resolution problem?
>
> A firefox error page?
>  
> You saw some  syscalls failing via ktrace?
>
> You looked at tcpdump outputs?
>
> Some network counters were increasing?
>
> > It's not the firefox port itself, since it started after I rebooted a
> > the new kernel, while base and xenocara were rebuilding. My previous
> > kernel was from jan, 31.
>
> So you're saying that *some* kernels newer than jan, 31 expose this
> regression?  Could you bisect when it has been introduced?
>
> > Other programs are fine (including chromium), but appart may be chromium
> > nothing I run is using threads and name resolution at the same time .
>
> You're assuming it's related to threaded applications,  why?
>
> > Any idea of what's causing this, or should I start bisecting ?
>
> A bisection would be great.

And the winner is:
https://github.com/openbsd/src/commit/a0801e345934b8c139c255c8327f726a614b3267

Author: mpi <[hidden email]>
Date:   Fri Feb 9 07:32:35 2018 +0000

    Call socreate() before falloc() in sys_socket().
   
    This is similar to what we do in sys_socketpair() and will allow us
    to grab the KERNEL_LOCK() only after having created a socket.
   
    ok tedu@

Reverting that commit with the patch below, I've not been able to
reproduce the issue in -current.

Index: uipc_syscalls.c
===================================================================
RCS file: /cvs/OpenBSD/src/sys/kern/uipc_syscalls.c,v
retrieving revision 1.163
diff -u -r1.163 uipc_syscalls.c
--- uipc_syscalls.c 9 Feb 2018 07:32:35 -0000 1.163
+++ uipc_syscalls.c 11 Feb 2018 20:56:24 -0000
@@ -1,4 +1,4 @@
-/* $OpenBSD: uipc_syscalls.c,v 1.163 2018/02/09 07:32:35 mpi Exp $ */
+/* $OpenBSD: uipc_syscalls.c,v 1.162 2018/01/09 15:14:23 mpi Exp $ */
 /* $NetBSD: uipc_syscalls.c,v 1.19 1996/02/09 19:00:48 christos Exp $ */
 
 /*
@@ -83,7 +83,7 @@
  struct file *fp;
  int type = SCARG(uap, type);
  int domain = SCARG(uap, domain);
- int fd, cloexec, nonblock, fflag, error;
+ int fd, error;
  unsigned int ss = 0;
 
  if ((type & SOCK_DNS) && !(domain == AF_INET || domain == AF_INET6))
@@ -95,24 +95,23 @@
  if (error)
  return (error);
 
- type &= ~(SOCK_CLOEXEC | SOCK_NONBLOCK | SOCK_DNS);
- cloexec = (SCARG(uap, type) & SOCK_CLOEXEC) ? UF_EXCLOSE : 0;
- nonblock = SCARG(uap, type) &  SOCK_NONBLOCK;
- fflag = FREAD | FWRITE | (nonblock ? FNONBLOCK : 0);
-
- error = socreate(SCARG(uap, domain), &so, type, SCARG(uap, protocol));
+ fdplock(fdp);
+ error = falloc(p, (type & SOCK_CLOEXEC) ? UF_EXCLOSE : 0, &fp, &fd);
+ fdpunlock(fdp);
  if (error != 0)
  goto out;
 
- fdplock(fdp);
- error = falloc(p, cloexec, &fp, &fd);
- fdpunlock(fdp);
+ fp->f_flag = FREAD | FWRITE | (type & SOCK_NONBLOCK ? FNONBLOCK : 0);
+ fp->f_type = DTYPE_SOCKET;
+ fp->f_ops = &socketops;
+ error = socreate(SCARG(uap, domain), &so,
+    type & ~(SOCK_CLOEXEC | SOCK_NONBLOCK | SOCK_DNS), SCARG(uap, protocol));
  if (error) {
- soclose(so);
+ fdplock(fdp);
+ fdremove(fdp, fd);
+ closef(fp, p);
+ fdpunlock(fdp);
  } else {
- fp->f_flag = fflag;
- fp->f_type = DTYPE_SOCKET;
- fp->f_ops = &socketops;
  if (type & SOCK_NONBLOCK)
  so->so_state |= SS_NBIO;
  so->so_state |= ss;


--
Matthieu Herrb