>Synopsis: When using a vnd(4) device, attempting to copy
more than about 2 or 3 gigabytes to it causes the system to hang.
The Cache: memory line increases to use all physical memory, but
the system does not enter swap.
Machine : amd64
When copying files to a vnd(4) device, the system will
continually use more and more memory until it hangs.
rsync(1) shows a process stuck in "needbuf" under the WAIT
Though it only hangs partially. I can still move the
mouse, and if an xterm has been opened, I can still type
text into it. Trying to actually execute a program
however, does not work. Text can still be typed into the
terminal after pressing return, but the terminal does not
respond to ^C and the program never launches.
Switching from X11 to the console via CTRL+ALT+F1 causes a
hard hang that requires holding the power button for 5
seconds in order to shut down the system.
top(1) shows the "Cache:" line continually increasing up
to the point of using all physical memory. However, once
it gets there the system does not start using swap.
The same behavior is exhibited when cp(1) is used instead
of rsync(1), so it does not seem to be program specific.
In some cases, I can ^C rsync or cp(1) and the system can
still be used, but the process stuck in "needbuf" never
goes away (or at least, not in the 5 or so minutes I
waited). It also cannot be pkill(1)ed, even with a
"pkill -9 rsync".
I am aware that the following dmesg(1) comes from a custom
kernel. However, the problem still occurs on a snapshot.
# I can usually trigger the bug after copying 2-3 gigabytes of
# files to the vnd(4) device. I create a 5 gigabyte vnd(4) device
# here. If your system has much more memory than mine does (16 GB),
# it may take more to trigger the bug.
newfs -O2 vnd0d
mount /dev/vnd0d /mnt/vndtest
doas rsync -au --progress /usr/obj/* /mnt/vndtest
# It doesn't matter where the files come from, I just picked a
# filesystem containing numerous files, with a mix of large and
# small files as well.
When the system hangs, top shows the following:
PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND
95098 root -6 0 60M 4508K idle needbuf 0:29 0.00% rsync
43786 root 2 0 60M 5280K idle select 0:21 0.00% rsync
62524 root 10 0 32M 3520K idle inode 0:03 0.00% rsync
Eventually, the rsync processes in "select" and "inode" will
terminate, but the one stuck in "needbuf" does not.
Since it may be relevant, I have tweaked a few knobs in sysctl.conf.
For the sake of brevity, many default and commented lines have been
If I have shot myself in the foot by turning these knobs, I will
take full responsibility, however I have isolated the commit that
caused the hang to start and experienced no issues prior.
# $OpenBSD: sysctl.conf,v 1.56 2014/05/06 23:05:51 tedu Exp $
# This file contains a list of sysctl options the user wants set at
# boot time. See sysctl(3) and sysctl(8) for more information on
# the many available variables.
net.inet.ip.forwarding=1 # 1=Permit forwarding (routing) of IPv4 packets
net.inet6.ip6.forwarding=1 # 1=Permit forwarding (routing) of IPv6 packets
machdep.kbdreset=1 # permit console CTRL-ALT-DEL to do a nice halt
machdep.lidaction=0 # laptop lid closes cause a suspend 0=no 1=yes
kern.bufcachepercent=60 # As of 2018-04-02, 90% breaks networking
kern.maxfiles=30000 # increase maxfiles for KDE4
kern.shminfo.shmall=262144 # For KDE4, increase shared memory a ton for memory-hogs, in hw.pagesize units. Currently set to 1024MB
kern.shminfo.shmmni=1024 # KDE4 wants this increased too
kern.maxvnodes=1000000 # increase vnode cache to improve pathname->inode lookups
kern.somaxconn=2048 # increase total possible open sockets
vfs.ffs.dirhash_maxmem=33554432 # increase dirhash memory to 32MB to speed up scanning large directories
Reverting the following commit fixes the issue for me. I
also had to adjust src/sys/sys/buf.h because the prototype
of getblk() changed from
struct buf *getblk(struct vnode *, daddr_t, int, int, int);
struct buf *getblk(struct vnode *, daddr_t, int, int, uint64_t);
in the interim.