Bug with scp in OpenBSD

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug with scp in OpenBSD

illya.meyer@wiesan.de
Good afternoon,

I think, I've found a bug in OpenBSD and it seems, it exists since 6.5.

When I try to scp several archives (OpenBSD install files) from one
BSD-Machine to another, I get on client side the error:
---- Schnipp 8< ----
client_loop: send disconnect: Broken pipe
---- Schnapp 8< ----

and on the server in /var/log/auth.log:
---- Schnipp 8< ----
[...] sshd[20245]: ssh_dispatch_run_fatal: Connection from user root
10.69.0.15 port 5835: message authentication code incorrect
---- Schnapp 8< ----

Scenario:
We have ca. 70 BSD-Boxes, one is the "controller" and the others are
firewalls.

I installed a plain 6.4 on a test-controller and could copy from it to
the firewall-targets without problems.
The error occured, when I updated the test-controller to 6.5 and is
still present in 6.6-current (2019-11-21 12:00 GMT+1 installed).
It doesn't matter what version is running on the target host.

I tested to copy to 10 target hosts and got the following result:
- 7 hosts worked always without problems
- 3 hosts made problems but not every copy job failed

Some more information:
- The controller is a virtual machine on ESXi.
- The target hosts are real machines
- The target hosts are working as a network bridge with two interfaces.

For debugging, I started on one target host a second sshd with:

/usr/sbin/sshd -ddd -p 50 -E sshd.debug.4.log

debug2: load_server_config: filename /etc/ssh/sshd_config
debug2: load_server_config: done config len = 172
debug2: parse_server_config: config /etc/ssh/sshd_config len 172
debug3: /etc/ssh/sshd_config:39 setting AuthorizedKeysFile
.ssh/authorized_keys
debug3: /etc/ssh/sshd_config:86 setting Subsystem sftp
/usr/libexec/sftp-server
debug1: sshd version OpenSSH_8.1, LibreSSL 3.0.2
debug1: private host key #0: ssh-rsa SHA256:[...]
debug1: private host key #1: ecdsa-sha2-nistp256 SHA256:[...]
debug1: private host key #2: ssh-ed25519 SHA256:[...]
debug1: rexec_argv[0]='/usr/sbin/sshd'
debug1: rexec_argv[1]='-ddd'
debug1: rexec_argv[2]='-p'
debug1: rexec_argv[3]='50'
debug1: rexec_argv[4]='-E'
debug1: rexec_argv[5]='sshd.debug.4.log'
debug2: fd 4 setting O_NONBLOCK
debug1: Bind to port 50 on 0.0.0.0.
Server listening on 0.0.0.0 port 50.
debug2: fd 5 setting O_NONBLOCK
debug1: Bind to port 50 on ::.
Server listening on :: port 50.
debug1: fd 6 clearing O_NONBLOCK
debug1: Server will not fork when running in debugging mode.
debug3: send_rexec_state: entering fd = 9 config len 172
debug3: ssh_msg_send: type 0
debug3: send_rexec_state: done
debug1: rexec start in 6 out 6 newsock 6 pipe -1 sock 9

On client side, I started:
scp -vvvP 50 * root@foreignhost:/var/rel/

SHA256                                        100% 1989    79.1KB/s   00:00
SHA256.sig                                    100% 2141    85.9KB/s   00:00
base66.tgz                                    100%  237MB   4.8MB/s   00:49
bsd                                             0%    0     0.0KB/s
--:-- ETA

[logfile ssh.debug.4.log is attached]

Please provide me some information, how to go on with testing to help
you finding the problem.

Thank you and kind regards,
Illya Meyer


dmesg-controller (9K) Download Attachment
ssh.debug.4.log (14K) Download Attachment
sshd.debug.4.log (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug with scp in OpenBSD

Darren Tucker-3
On Fri, 22 Nov 2019 at 01:25, [hidden email]
<[hidden email]> wrote:
> I think, I've found a bug in OpenBSD and it seems, it exists since 6.5.
> When I try to scp several archives (OpenBSD install files) from one
> BSD-Machine to another, I get on client side the error:
[...]
> [...] sshd[20245]: ssh_dispatch_run_fatal: Connection from user root
> 10.69.0.15 port 5835: message authentication code incorrect

It's not a bug in scp.  It's possible that it's a bug in OpenSSH or
OpenBSD but unlikely.

The message means that the data was changed in transit between client
and server causing SSH's integrity check to fail.  A large number of
things have been found over the years to be causes of this, including
faulty/buggy network equipment, ram, ethernet interfaces and network
drivers.  Many of the known causes are listed here:
https://bugzilla.mindrot.org/show_bug.cgi?id=845

> /usr/sbin/sshd -ddd -p 50 -E sshd.debug.4.log

If you add "-e" to the command line you'll get more information from
after when sshd re-execs itself.

>  The controller is a virtual machine on ESXi.

This is the first variable I'd try removing if possible.  There have
been other cases of VMWare networking causing problems (although not
these symptoms):
https://marc.info/?l=openssh-unix-dev&m=153535111501535&w=2

from the dmesg:
> em0 at pci2 dev 0 function 0 "Intel 82545EM" rev 0x01:

Alternatively, I'd try switch the network interface to vio from em.

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860  37F4 9357 ECEF 11EA A6FA (new)
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

Reply | Threaded
Open this post in threaded view
|

Re: Bug with scp in OpenBSD

illya.meyer@wiesan.de
Am 22.11.19 um 07:14 schrieb Darren Tucker:
> It's not a bug in scp.  It's possible that it's a bug in OpenSSH or
> OpenBSD but unlikely.
>
> The message means that the data was changed in transit between client
> and server causing SSH's integrity check to fail.  A large number of
> things have been found over the years to be causes of this, including
> faulty/buggy network equipment, ram, ethernet interfaces and network
> drivers.  Many of the known causes are listed here:
> https://bugzilla.mindrot.org/show_bug.cgi?id=845

I'm a little bit helpless now.

My test-setup today was:
A real (BSD-)PC with 6.6 and my try to copy resulted in the same error.
Then I went to our outpost and connected this PC on the same switch as
the firewall and copied over 30 times without any error.
A copy from our controller (6.6) to the second PC failed, too.
So in my opinion the problem must be a combination from something „on
the way“ and the 6.6er installation :-(

I tried once more to copy from a 6.4 BSD to the firewall and got no
error. Do you have any idea, what could have changed between 6.4 and
6.5!? Or how can I ship around this error? I had a look at the
scp-logfile from a (6.4er) working copy and saw no great differences
between this and the error-logfile :-( On the mindrot-page was a
paragraph about a specific cipher, that caused the problem at his setup.
Do you think, this could be my problem, too? Or what could I else do?

Of course, I could copy my files on another way, like http(s), but it
has a negative smack, because you know, something is now not working,
what was running in a earlier release and over 2 years without a
problem. And you don't know, if this problem causes other failures :-(

>> /usr/sbin/sshd -ddd -p 50 -E sshd.debug.4.log
>
> If you add "-e" to the command line you'll get more information from
> after when sshd re-execs itself.

Same output with „-e“ as without :-(

>>   The controller is a virtual machine on ESXi.
>
> This is the first variable I'd try removing if possible.  There have
> been other cases of VMWare networking causing problems (although not
> these symptoms):
> https://marc.info/?l=openssh-unix-dev&m=153535111501535&w=2

Yes I tried, same problem with a real PC.

> from the dmesg:
>> em0 at pci2 dev 0 function 0 "Intel 82545EM" rev 0x01:
>
> Alternatively, I'd try switch the network interface to vio from em.

That's unfortunately not possible, the firewall is like a appliance with
two build-in nics and no room for extensions.

Thank you for your help and kind regards,
Illya Meyer

Reply | Threaded
Open this post in threaded view
|

Re: Bug with scp in OpenBSD

Sergey Prysiazhnyi-2
Hello.

I confirm the existence of an identical problem for 6.6 from my side and for
sftp(1) sessions as also.

On Fri, Nov 22, 2019 at 12:08:16PM +0100, [hidden email] wrote:

> Am 22.11.19 um 07:14 schrieb Darren Tucker:
> > It's not a bug in scp.  It's possible that it's a bug in OpenSSH or
> > OpenBSD but unlikely.
> >
> > The message means that the data was changed in transit between client
> > and server causing SSH's integrity check to fail.  A large number of
> > things have been found over the years to be causes of this, including
> > faulty/buggy network equipment, ram, ethernet interfaces and network
> > drivers.  Many of the known causes are listed here:
> > https://bugzilla.mindrot.org/show_bug.cgi?id=845
>
> I'm a little bit helpless now.
>
> My test-setup today was:
> A real (BSD-)PC with 6.6 and my try to copy resulted in the same error.
> Then I went to our outpost and connected this PC on the same switch as the
> firewall and copied over 30 times without any error.
> A copy from our controller (6.6) to the second PC failed, too.
> So in my opinion the problem must be a combination from something „on the
> way“ and the 6.6er installation :-(
>
> I tried once more to copy from a 6.4 BSD to the firewall and got no error.
> Do you have any idea, what could have changed between 6.4 and 6.5!? Or how
> can I ship around this error? I had a look at the scp-logfile from a (6.4er)
> working copy and saw no great differences between this and the error-logfile
> :-( On the mindrot-page was a paragraph about a specific cipher, that caused
> the problem at his setup. Do you think, this could be my problem, too? Or
> what could I else do?
>
> Of course, I could copy my files on another way, like http(s), but it has a
> negative smack, because you know, something is now not working, what was
> running in a earlier release and over 2 years without a problem. And you
> don't know, if this problem causes other failures :-(
>
> > > /usr/sbin/sshd -ddd -p 50 -E sshd.debug.4.log
> >
> > If you add "-e" to the command line you'll get more information from
> > after when sshd re-execs itself.
>
> Same output with „-e“ as without :-(
>
> > >   The controller is a virtual machine on ESXi.
> >
> > This is the first variable I'd try removing if possible.  There have
> > been other cases of VMWare networking causing problems (although not
> > these symptoms):
> > https://marc.info/?l=openssh-unix-dev&m=153535111501535&w=2
>
> Yes I tried, same problem with a real PC.
>
> > from the dmesg:
> > > em0 at pci2 dev 0 function 0 "Intel 82545EM" rev 0x01:
> >
> > Alternatively, I'd try switch the network interface to vio from em.
>
> That's unfortunately not possible, the firewall is like a appliance with two
> build-in nics and no room for extensions.
>
> Thank you for your help and kind regards,
> Illya Meyer

Reply | Threaded
Open this post in threaded view
|

Re: Bug with scp in OpenBSD

Darren Tucker-3
In reply to this post by illya.meyer@wiesan.de
On Fri, 22 Nov 2019 at 22:08, [hidden email]
<[hidden email]> wrote:
[...]
> My test-setup today was:
> A real (BSD-)PC with 6.6 and my try to copy resulted in the same error.

so real hardware copying over the same network path to the same
destination has the same problem.

> Then I went to our outpost and connected this PC on the same switch as
> the firewall and copied over 30 times without any error.

so a direct connection between source and destination worked OK.

> A copy from our controller (6.6) to the second PC failed, too.

so copying from the original source over the same network path to a
new destination has the same problem.

> So in my opinion the problem must be a combination from something „on
> the way“ and the 6.6er installation :-(

Sounds like it's something specific to this network path.  What's on it?

> I tried once more to copy from a 6.4 BSD to the firewall and got no
> error. Do you have any idea, what could have changed between 6.4 and
> 6.5!?

6 months worth of development.

> I had a look at the
> scp-logfile from a (6.4er) working copy and saw no great differences
> between this and the error-logfile :-( On the mindrot-page was a
> paragraph about a specific cipher, that caused the problem at his setup.
> Do you think, this could be my problem, too?

Possibly.  I have seen one instance where aes-gcm ciphers failed, I
suspect due to a faulty CPU.

> Or what could I else do?

Some suggestions:

a) you could tell us which ciphers you saw for the problematic and
working connections.

b) you could also install Portable OpenSSH 8.0 on your 6.6 system and
test copying to it.  That was the version of OpenSSH that shipped with
6.5 (likewise OpenSSH 7.9 that shipped with 6.4).  That would narrow
it down to whether or not it's OpenSSH or some other part of the
system.

c) ship a large test file via netcat to a netcat listener over the
same network paths and compare sha256sums of source and destination.
If you can reproduce a problem without ssh that'll narrow it down.

d) compare the ifconfigs for the 6.4 host and the 6.6 host and see if
anything is different.

> Of course, I could copy my files on another way, like http(s), but it
> has a negative smack, because you know, something is now not working,
> what was running in a earlier release and over 2 years without a
> problem. And you don't know, if this problem causes other failures :-(
>
> >> /usr/sbin/sshd -ddd -p 50 -E sshd.debug.4.log
> >
> > If you add "-e" to the command line you'll get more information from
> > after when sshd re-execs itself.
>
> Same output with „-e“ as without :-(
>
> >>   The controller is a virtual machine on ESXi.
> >
> > This is the first variable I'd try removing if possible.  There have
> > been other cases of VMWare networking causing problems (although not
> > these symptoms):
> > https://marc.info/?l=openssh-unix-dev&m=153535111501535&w=2
>
> Yes I tried, same problem with a real PC.
>
> > from the dmesg:
> >> em0 at pci2 dev 0 function 0 "Intel 82545EM" rev 0x01:
> >
> > Alternatively, I'd try switch the network interface to vio from em.
>
> That's unfortunately not possible, the firewall is like a appliance with
> two build-in nics and no room for extensions.

The dmesg says it's from the VM controller not the hardware firewalls.
I don't think we've seen a dmesg from a firewall.

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860  37F4 9357 ECEF 11EA A6FA (new)
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

Reply | Threaded
Open this post in threaded view
|

Re: Bug with scp in OpenBSD

Darren Tucker-3
In reply to this post by Sergey Prysiazhnyi-2
On Sat, 23 Nov 2019 at 03:28, Sergey Prysiazhnyi
<[hidden email]> wrote:
> I confirm the existence of an identical problem for 6.6 from my side and for
> sftp(1) sessions as also.

OK, but since you have not provided any information about the systems
and environment then it's unlikely anyone can help you.  I suggest you
a) describe your setup including network equipment and dmesgs of
affected hosts, and b) read the descriptions of the other problematic
setup and see if you can find any commonalities.

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860  37F4 9357 ECEF 11EA A6FA (new)
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

Reply | Threaded
Open this post in threaded view
|

Re: Bug with scp in OpenBSD

illya.meyer@wiesan.de
In reply to this post by Darren Tucker-3
Am 23.11.19 um 11:34 schrieb Darren Tucker:
> so real hardware copying over the same network path to the same
> destination has the same problem.

Correct.

> so a direct connection between source and destination worked OK.

Correct.

> so copying from the original source over the same network path to a
> new destination has the same problem.

Correct.

More graphical :-)

Ctrl ... WAN ... FW
6.6 --- scp -->  6.6                     => problem
6.4 --- scp -->  6.6                     => no problem

PC   ... WAN ... FW
6.6 --- scp -->  6.6                     => problem

                  FW ... LAN ... PC (same as above)
                  6.6  <- scp -- 6.6      => no problem

Ctrl ... WAN ... FW ... LAN ... PC
6.6  ---------- scp --------->  6.6      => problem


> Sounds like it's something specific to this network path.  What's on it?

Good question … at least two Cisco-routers, maybe some hardware from
another ISP, connection via fibre or copper. I'm trying to figure out
the differences and commonalities from the outposts, which work and
which don't.

> 6 months worth of development.

Ok, stupid question. I'm sorry about that.

> Some suggestions:
>
> a) you could tell us which ciphers you saw for the problematic and
> working connections.

I attached two scp-logfiles, one from a working and one not working.
They look the same, except from a few access tries to id-files that
doesn't exist (at the beginning).

> b) you could also install Portable OpenSSH 8.0 on your 6.6 system and
> test copying to it.  That was the version of OpenSSH that shipped with
> 6.5 (likewise OpenSSH 7.9 that shipped with 6.4).  That would narrow
> it down to whether or not it's OpenSSH or some other part of the
> system.

I installed crisscross *):

On OpenBSD 6.6:
- OpenSSH 7.9 worked NOT, but I've got a different error message after a
few seconds of the copy:
packet_write_poll: Connection to 10.25.0.253 port 22: Permission denied
lost connection
- OpenSSH 8.0 worked NOT, old error message:
client_loop: send disconnect: Broken pipe
lost connection

On OpenBSD 6.4:
- OpenSSH 8.0 worked fine without problems.

> c) ship a large test file via netcat to a netcat listener over the
> same network paths and compare sha256sums of source and destination.
> If you can reproduce a problem without ssh that'll narrow it down.

I've copied a lot of bits through the cable without any problems. The
checksum was always ok.

> d) compare the ifconfigs for the 6.4 host and the 6.6 host and see if
> anything is different.

ifconfig -a from 6.4 and 6.6 looks exactly the same. The config-files
are of course the same, nothing has happend during the upgrade.

> The dmesg says it's from the VM controller not the hardware firewalls.
> I don't think we've seen a dmesg from a firewall.

I've attached a dmesg from the 6.6 firewall, which I used as test target
all the time. The PC from the test above is the same hardware model.

Thank you and kind regards,
Illya Meyer

*) I hope, my installation was right (example from 8.0 on 6.4)
./configure --prefix=/usr/local/8.0
make
make install
cd /usr/local/8.0/bin
./ssh -V
OpenSSH_8.0p1, LibreSSL 2.8.2
./scp <source> <dest>


dmesg.firewall (8K) Download Attachment
ssh.debug.64.working.log (13K) Download Attachment
ssh.debug.66.not-working.log (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug with scp in OpenBSD

Darren Tucker-3
On Thu, 28 Nov 2019 at 01:54, [hidden email]
<[hidden email]> wrote:
[...]

> On OpenBSD 6.6:
> - OpenSSH 7.9 worked NOT, but I've got a different error message after a
> few seconds of the copy:
> packet_write_poll: Connection to 10.25.0.253 port 22: Permission denied
> lost connection
> - OpenSSH 8.0 worked NOT, old error message:
> client_loop: send disconnect: Broken pipe
> lost connection
>
> On OpenBSD 6.4:
> - OpenSSH 8.0 worked fine without problems.

Portable OpenSSH 8.0 contains all of the protocol changes in OpenBSD
6.6, so I think the combination of these two tests points to the
problem being the combination of 6.6 and something about your link.

> I've copied a lot of bits through the cable without any problems. The
> checksum was always ok.

I'm stumped, sorry.

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860  37F4 9357 ECEF 11EA A6FA (new)
    Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.

Reply | Threaded
Open this post in threaded view
|

Re: Bug with scp in OpenBSD

illya.meyer@wiesan.de
Am 07.12.19 um 13:20 schrieb Darren Tucker:

> On Thu, 28 Nov 2019 at 01:54, [hidden email]
> <[hidden email]> wrote:
> [...]
>> On OpenBSD 6.6:
>> - OpenSSH 7.9 worked NOT, but I've got a different error message after a
>> few seconds of the copy:
>> packet_write_poll: Connection to 10.25.0.253 port 22: Permission denied
>> lost connection
>> - OpenSSH 8.0 worked NOT, old error message:
>> client_loop: send disconnect: Broken pipe
>> lost connection
>>
>> On OpenBSD 6.4:
>> - OpenSSH 8.0 worked fine without problems.
>
> Portable OpenSSH 8.0 contains all of the protocol changes in OpenBSD
> 6.6, so I think the combination of these two tests points to the
> problem being the combination of 6.6 and something about your link.
>
>> I've copied a lot of bits through the cable without any problems. The
>> checksum was always ok.
>
> I'm stumped, sorry.
Nethertheless thank you for help and effort.

So, please have a cookie for that :-) (it was the first baking test).

I'll keep you inform, if I find something out.

Kind regards,
Illya Meyer


bsd-cookie.jpg (416K) Download Attachment