help understanding cua/tty EBUSY behaviour?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

help understanding cua/tty EBUSY behaviour?

Adam Thompson
Summary:  I open cua0 with cu(1), quit cu(1), try to re-open with cu(1)
but now it immediately fails with EBUSY.  *Usually* doesn't happen with
USB-to-serial (cuaU[0-9]) but have still seen it once or twice.

I've seen this behaviour on OpenBSD 6.4, OpenBSD 6.5, and FreeBSD 11.2,
and on 3 radically different systems (hardware-wise) so I don't think
it's a version-specific or even hardware-specific bug, more likely
something I'm failing to understand.

I'm using OpenBSD as a remote serial console server for up to 3 switches
at a time (OOB access to a few switches in my lab).  This works,
mostly... but occasionally, a serial port, almost always the onboard
hardware cua0/tty0 port, somehow wedges and requires me to reboot the
OBSD system to regain access to it.  The symptom is that when attempting
to open(2) the device, I get EBUSY... for no obvious reason.  fuser(1)
shows no other processes having a filehandle on /dev/cua00.

I don't understand why this happens inconsistently - about ~75% of the
time on /dev/cua00, but only ~10% of the time on /dev/cuaU[0-1].  Of the
~75% of the time it happens on /dev/cua00, about 1/3 of those times, if
I wait a minute or ten, I can then re-open the device (again using
cu(1)), and 2/3 of those times it persists until a reboot.

On the USB devices, I can - with 100% success - eliminate the problem by
walking over, unplugging the serial adapter, and re-inserting it.

I haven't tried using e.g. screen(1) from ports, I've only tested using
cu(1) in base so far.  I can try something else if there's a reason
to... on the OpenBSD box anyway, it'll be a bit harder on the FreeBSD
appliance.

As I said, I've even seen this on FreeBSD, so I expect I just need
someone to provide an explanation of what nuance of tty(4) usage I'm
missing.

Help?

Thanks,
-Adam

Reply | Threaded
Open this post in threaded view
|

Re: help understanding cua/tty EBUSY behaviour?

Theo de Raadt-2
Adam Thompson <[hidden email]> wrote:

> Summary:  I open cua0 with cu(1), quit cu(1), try to re-open with
> cu(1) but now it immediately fails with EBUSY.  *Usually* doesn't
> happen with USB-to-serial (cuaU[0-9]) but have still seen it once or
> twice.
>
> I've seen this behaviour on OpenBSD 6.4, OpenBSD 6.5, and FreeBSD
> 11.2, and on 3 radically different systems (hardware-wise) so I don't
> think it's a version-specific or even hardware-specific bug, more
> likely something I'm failing to understand.
>
> I'm using OpenBSD as a remote serial console server for up to 3
> switches at a time (OOB access to a few switches in my lab).  This
> works, mostly... but occasionally, a serial port, almost always the
> onboard hardware cua0/tty0 port, somehow wedges and requires me to
> reboot the OBSD system to regain access to it.  The symptom is that
> when attempting to open(2) the device, I get EBUSY... for no obvious
> reason.  fuser(1) shows no other processes having a filehandle on
> /dev/cua00.
>
> I don't understand why this happens inconsistently - about ~75% of the
> time on /dev/cua00, but only ~10% of the time on /dev/cuaU[0-1].  Of
> the ~75% of the time it happens on /dev/cua00, about 1/3 of those
> times, if I wait a minute or ten, I can then re-open the device (again
> using cu(1)), and 2/3 of those times it persists until a reboot.
>
> On the USB devices, I can - with 100% success - eliminate the problem
> by walking over, unplugging the serial adapter, and re-inserting it.
>
> I haven't tried using e.g. screen(1) from ports, I've only tested
> using cu(1) in base so far.  I can try something else if there's a
> reason to... on the OpenBSD box anyway, it'll be a bit harder on the
> FreeBSD appliance.
>
> As I said, I've even seen this on FreeBSD, so I expect I just need
> someone to provide an explanation of what nuance of tty(4) usage I'm
> missing.


All these drivers have a "close" function which is called upon last
close().  From dev/ic/com.c:

int
comclose(dev_t dev, int flag, int mode, struct proc *p)
{
        int unit = DEVUNIT(dev);
        struct com_softc *sc = com_cd.cd_devs[unit];
        struct tty *tp = sc->sc_tty;
        int s;

#ifdef COM_CONSOLE
        /* XXX This is for cons.c. */
        if (!ISSET(tp->t_state, TS_ISOPEN))
                return 0;
#endif

        if(sc->sc_swflags & COM_SW_DEAD)
                return 0;

        (*linesw[tp->t_line].l_close)(tp, flag, p);
        s = spltty();
        if (ISSET(tp->t_state, TS_WOPEN)) {
                /* tty device is waiting for carrier; drop dtr then re-raise */
                CLR(sc->sc_mcr, MCR_DTR | MCR_RTS);
                com_write_reg(sc, com_mcr, sc->sc_mcr);
                timeout_add_sec(&sc->sc_dtr_tmo, 2);
                ...

You are observing the forcing-down of DTR and RTS for a long enough
period that the other side of the link notices the event.

Therefore it is not suprising you are finding this behaviour very
similar in many drivers and operating systems.

Reply | Threaded
Open this post in threaded view
|

Re: help understanding cua/tty EBUSY behaviour?

Adam Thompson
On 2019-08-03 18:14, Theo de Raadt wrote:
> Adam Thompson <[hidden email]> wrote:
>
>> Summary:  I open cua0 with cu(1), quit cu(1), try to re-open with
>> cu(1) but now it immediately fails with EBUSY.  *Usually* doesn't
>> happen with USB-to-serial (cuaU[0-9]) but have still seen it once or
>> twice.
[...]
> You are observing the forcing-down of DTR and RTS for a long enough
> period that the other side of the link notices the event.
>
> Therefore it is not suprising you are finding this behaviour very
> similar in many drivers and operating systems.

Thanks!
I feel as though I'm seeing something that's slightly off (see below),
but I was focused on the open() end of things not the close() end - I
now need to do more reading.


FWIW, things that don't *feel* right about this:
* "long enough" doesn't typically describe multiple days, in DTE/DCE
signalling.  Far-end device is the same when testing using cua00 vs.
cuaU0.
* isn't cua(4) supposed to be the device we use to ignore line signals?  
Why would closing it fiddle with line status?  Maybe I'm reading too
much into hupcl in stty(1) manpage...
* and now I'm wondering if cu(1) fiddles with [-]clocal and/or [-]hupcl

As I said, more reading now you've given me a direction and I'll
probably come back with at least one or two of my own answers.

-Adam

Reply | Threaded
Open this post in threaded view
|

Re: help understanding cua/tty EBUSY behaviour?

Theo de Raadt-2
Adam Thompson <[hidden email]> wrote:

> On 2019-08-03 18:14, Theo de Raadt wrote:
> > Adam Thompson <[hidden email]> wrote:
> >
> >> Summary:  I open cua0 with cu(1), quit cu(1), try to re-open with
> >> cu(1) but now it immediately fails with EBUSY.  *Usually* doesn't
> >> happen with USB-to-serial (cuaU[0-9]) but have still seen it once or
> >> twice.
> [...]
> > You are observing the forcing-down of DTR and RTS for a long enough
> > period that the other side of the link notices the event.
> >
> > Therefore it is not suprising you are finding this behaviour very
> > similar in many drivers and operating systems.
>
> Thanks!
> I feel as though I'm seeing something that's slightly off (see below),
> but I was focused on the open() end of things not the close() end - I
> now need to do more reading.
>
>
> FWIW, things that don't *feel* right about this:
> * "long enough" doesn't typically describe multiple days, in DTE/DCE
> signalling.  Far-end device is the same when testing using cua00
> vs. cuaU0.
> * isn't cua(4) supposed to be the device we use to ignore line
> signals?  Why would closing it fiddle with line status?  Maybe I'm
> reading too much into hupcl in stty(1) manpage...
> * and now I'm wondering if cu(1) fiddles with [-]clocal and/or [-]hupcl
>
> As I said, more reading now you've given me a direction and I'll
> probably come back with at least one or two of my own answers.

There are a few obvious races being avoided.  If the cua node is closed,
you don't want a getty to start immediately on a line with unknown condition.
You want the signalling to settle, and you don't want a double open of cua
and tty.  You want the new process to not encounter a "strange line condition".

I believe that is why this settling behaviour has been in the tty
subsystem for nearly two decades, if not longer.  Previous to that,
our cua behaviour is a clone of the the behaviour in SunOS.  That
was a bit fishy right from the getgo, but we desperately needed such
a thing without resorting to special flags to open().

I believe this is about giving the future open of a cua or tty a known
condition, rather than the line condition during the close of the previous.

A simplistic alternative would be to add the 2-second delay to open,
but now that I've said that you'll know that's the wrong approach,
and therefore this code is found in close.

Anyways, look this is 20+ year old behaviour.  If you ask questions
and propose change to this, the onus is entirely on you to prove
that the current situation is wrong and you have a better idea. This
should not be done in english words, but with a diff which is then
tested in a vast number of circumstances and only then can the impact
be discussed.