6868 pvclock_gpa = vcpu->vc_pvclock_system_gpa & 0xFFFFFFFFFFFFFFF0; <-- controlled by the guest
6869 if (!pmap_extract(vm->vm_map->pmap, pvclock_gpa, &pvclock_hpa))
6870 return (EINVAL);
6871 pvclock_ti = (void*) PMAP_DIRECT_MAP(pvclock_hpa);
6873 /* START next cycle (must be odd) */
6874 pvclock_ti->ti_version =
6875 (++vcpu->vc_pvclock_version << 1) | 0x1;
Three things are wrong:
1) The RO protections are not enforced, so the guest could have data be
written to a GPA it can only access as RO.
2) If 'pvclock_ti' crosses a page, its second half could point to an HPA
that doesn't belong to the guest. The guest can therefore, to some
limited extent, overwrite host kernel memory.
3) The pmap is not locked, so if the GPA gets unmapped and its
corresponding HPA recycled, there is a small window where the (new)
content of the HPA can get overwritten.
There is, in fact, a fourth case. Watch closely. On AMD CPUs the NPTs are
a regular pmap. The higher half of the GPA space is therefore mapped to
host kernel memory as KVA. Given that there is no check on PG_u here, the
guest can just put a host KVA in pvclock_gpa, and have its content be
overwritten. This gives write-where ability for the guest.
The OpenBSD kernel does not perform full ASLR, in that the PTE space and
direct map are at static addresses (contrary to eg NetBSD where everything
is randomized). These addresses are known. The guest can therefore use the
static address of the direct map for example to write at whatever HPA by
issuing the following instruction:
This means the guest can overwrite whatever host kernel memory, and can
control *where* to write. I have tested this, and it works.
The guest can also choose *what* to write, because it just so happens that
'vc_pvclock_version' is the number of VMEXITs that occurred with pvclock
enabled, and the guest can reliably craft this value. So this is not just
a write-where, this is a full guest-to-host write-what-where.
Had there been proper ASLR, it still could have been somewhat bypassable,
because VMD does a pass-through of RDMSR on AMD CPUs (??), which can leak
HPAs such as HSAVE_PA.
(Speaking about direct map, notice how an alignment bug in locore0.S
causes the first 2MB of .text to be writable on Intel CPUs. So there is a
static address that maps the kernel .text as writable.)
There are additional assorted bugs and vulns that could be used to some
- On AMD CPUs the CPL check on XSETBV VMEXITs must be performed by
software. VMD forgot to do that, so from guest-userland, we can control
the XCR0 that guest-kernel will use.
- This XSETBV issue actually has an additional ramification. Right now
OpenBSD doesn't check that the guest XCR0 is a subset of the host XCR0,
which means that the guest can use more FPU states than the host allows.
It looks like this check was lost when fixing another bug I reported one
year ago which could cause guest-to-host DoS.
- The TLB handling of guest pages is broken, in that the INVEPT
instructions in the host could be issued on the wrong CPUs. This means
that if UVM decides to swap out a guest page, the guest could still
access it via stale TLB entries. On AMD CPUs, there is no TLB handling
at all (??).
- vmx_load_pdptes is broken.
In order to make this whole thing less of a security joke, I would suggest
- Fix TLB handling, sanitize the GPAs, lock the pages correctly.
- Don't pass-through RDMSR.
- Fix the XSETBV issues.
- Provide *real* ASLR: randomize the PTE space and the direct map.
- Fix the alignment bug in the direct map to not map the text as writable.
>>Module name: src
>>Changes by: [hidden email] 2020/02/15 15:59:55
>> sys/arch/amd64/amd64: vmm.c
>>Add bounds check on addresses passed from guests in pvclock.
>>Fixes an issue where a guest can write to host memory by passing bogus addresses.
>I'm a bit confused here. It is not because the GPAs are contiguous that the
>HPAs are too. If the structure crosses a page, the guest still can write to