summaryrefslogtreecommitdiffstats
path: root/arch/arm64/include/asm/kvm_host.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2024-09-161-2/+20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Paolo Bonzini: "These are the non-x86 changes (mostly ARM, as is usually the case). The generic and x86 changes will come later" ARM: - New Stage-2 page table dumper, reusing the main ptdump infrastructure - FP8 support - Nested virtualization now supports the address translation (FEAT_ATS1A) family of instructions - Add selftest checks for a bunch of timer emulation corner cases - Fix multiple cases where KVM/arm64 doesn't correctly handle the guest trying to use a GICv3 that wasn't advertised - Remove REG_HIDDEN_USER from the sysreg infrastructure, making things little simpler - Prevent MTE tags being restored by userspace if we are actively logging writes, as that's a recipe for disaster - Correct the refcount on a page that is not considered for MTE tag copying (such as a device) - When walking a page table to split block mappings, synchronize only at the end the walk rather than on every store - Fix boundary check when transfering memory using FFA - Fix pKVM TLB invalidation, only affecting currently out of tree code but worth addressing for peace of mind LoongArch: - Revert qspinlock to test-and-set simple lock on VM. - Add Loongson Binary Translation extension support. - Add PMU support for guest. - Enable paravirt feature control from VMM. - Implement function kvm_para_has_feature(). RISC-V: - Fix sbiret init before forwarding to userspace - Don't zero-out PMU snapshot area before freeing data - Allow legacy PMU access from guest - Fix to allow hpmcounter31 from the guest" * tag 'for-linus-non-x86' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (64 commits) LoongArch: KVM: Implement function kvm_para_has_feature() LoongArch: KVM: Enable paravirt feature control from VMM LoongArch: KVM: Add PMU support for guest KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifier KVM: arm64: Simplify visibility handling of AArch32 SPSR_* KVM: arm64: Simplify handling of CNTKCTL_EL12 LoongArch: KVM: Add vm migration support for LBT registers LoongArch: KVM: Add Binary Translation extension support LoongArch: KVM: Add VM feature detection function LoongArch: Revert qspinlock to test-and-set simple lock on VM KVM: arm64: Register ptdump with debugfs on guest creation arm64: ptdump: Don't override the level when operating on the stage-2 tables arm64: ptdump: Use the ptdump description from a local context arm64: ptdump: Expose the attribute parsing functionality KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer KVM: arm64: Move pagetable definitions to common header KVM: arm64: nv: Add support for FEAT_ATS1A KVM: arm64: nv: Plumb handling of AT S1* traps from EL2 KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3 KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration ...
| * Merge branch kvm-arm64/vgic-sre-traps into kvmarm-master/nextMarc Zyngier2024-09-121-0/+2
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/vgic-sre-traps: : . : Fix the multiple of cases where KVM/arm64 doesn't correctly : handle the guest trying to use a GICv3 that isn't advertised. : : From the cover letter: : : "It recently appeared that, when running on a GICv3-equipped platform : (which is what non-ancient arm64 HW has), *not* configuring a GICv3 : for the guest could result in less than desirable outcomes. : : We have multiple issues to fix: : : - for registers that *always* trap (the SGI registers) or that *may* : trap (the SRE register), we need to check whether a GICv3 has been : instantiated before acting upon the trap. : : - for registers that only conditionally trap, we must actively trap : them even in the absence of a GICv3 being instantiated, and handle : those traps accordingly. : : - finally, ID registers must reflect the absence of a GICv3, so that : we are consistent. : : This series goes through all these requirements. The main complexity : here is to apply a GICv3 configuration on the host in the absence of a : GICv3 in the guest. This is pretty hackish, but I don't have a much : better solution so far. : : As part of making wider use of of the trap bits, we fully define the : trap routing as per the architecture, something that we eventually : need for NV anyway." : . KVM: arm64: selftests: Cope with lack of GICv3 in set_id_regs KVM: arm64: Add selftest checking how the absence of GICv3 is handled KVM: arm64: Unify UNDEF injection helpers KVM: arm64: Make most GICv3 accesses UNDEF if they trap KVM: arm64: Honor guest requested traps in GICv3 emulation KVM: arm64: Add trap routing information for ICH_HCR_EL2 KVM: arm64: Add ICH_HCR_EL2 to the vcpu state KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest KVM: arm64: Add helper for last ditch idreg adjustments KVM: arm64: Force GICv3 trap activation when no irqchip is configured on VHE KVM: arm64: Force SRE traps when SRE access is not enabled KVM: arm64: Move GICv3 trap configuration to kvm_calculate_traps() Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * KVM: arm64: Add ICH_HCR_EL2 to the vcpu stateMarc Zyngier2024-08-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | As we are about to describe the trap routing for ICH_HCR_EL2, add the register to the vcpu state in its VNCR form, as well as reset Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Add save/restore support for FPMRMarc Zyngier2024-08-271-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just like the rest of the FP/SIMD state, FPMR needs to be context switched. The only interesting thing here is that we need to treat the pKVM part a bit differently, as the host FP state is never written back to the vcpu thread, but instead stored locally and eagerly restored. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Move FPMR into the sysreg arrayMarc Zyngier2024-08-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just like SVCR, FPMR is currently stored at the wrong location. Let's move it where it belongs. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Add predicate for FPMR support in a VMMarc Zyngier2024-08-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we are about to check for the advertisement of FPMR support to a guest in a number of places, add a predicate that will gate most of the support code for FPMR. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Move SVCR into the sysreg arrayMarc Zyngier2024-08-271-1/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SVCR is just a system register, and has no purpose being outside of the sysreg array. If anything, it only makes it more difficult to eventually support SME one day. If ever. Move it into the array with its little friends, and associate it with a visibility predicate. Although this is dead code, it at least paves the way for the next set of FP-related extensions. Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240820131802.3547589-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
* | Merge branch 'for-next/poe' into for-next/coreWill Deacon2024-09-121-0/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * for-next/poe: (31 commits) arm64: pkeys: remove redundant WARN kselftest/arm64: Add test case for POR_EL0 signal frame records kselftest/arm64: parse POE_MAGIC in a signal frame kselftest/arm64: add HWCAP test for FEAT_S1POE selftests: mm: make protection_keys test work on arm64 selftests: mm: move fpregs printing kselftest/arm64: move get_header() arm64: add Permission Overlay Extension Kconfig arm64: enable PKEY support for CPUs with S1POE arm64: enable POE and PIE to coexist arm64/ptrace: add support for FEAT_POE arm64: add POE signal support arm64: implement PKEYS support arm64: add pte_access_permitted_no_overlay() arm64: handle PKEY/POE faults arm64: mask out POIndex when modifying a PTE arm64: convert protection key into vm_flags and pgprot values arm64: add POIndex defines arm64: re-order MTE VM_ flags arm64: enable the Permission Overlay Extension for EL0 ...
| * | KVM: arm64: Save/restore POE registersJoey Gouly2024-09-041-0/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | Define the new system registers that POE introduces and context switch them. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240822151113.1479789-8-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* / perf: arm_pmuv3: Prepare for more than 32 countersRob Herring (Arm)2024-08-161-4/+4
|/ | | | | | | | | | | | | | Various PMUv3 registers which are a mask of counters are 64-bit registers, but the accessor functions take a u32. This has been fine as the upper 32-bits have been RES0 as there has been a maximum of 32 counters prior to Armv9.4/8.9. With Armv9.4/8.9, a 33rd counter is added. Update the accessor functions to use a u64 instead. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-2-280a8d7ff465@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
* Merge tag 'kvm-x86-generic-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini2024-07-161-1/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM generic changes for 6.11 - Enable halt poll shrinking by default, as Intel found it to be a clear win. - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86. - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out(). - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs. - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout. - A few minor cleanups
| * KVM: Delete the now unused kvm_arch_sched_in()Sean Christopherson2024-06-111-1/+0
| | | | | | | | | | | | | | | | | | Delete kvm_arch_sched_in() now that all implementations are nops. Reviewed-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20240522014013.1672962-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
* | Merge branch kvm-arm64/nv-sve into kvmarm/nextOliver Upton2024-07-141-0/+6
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/nv-sve: : CPTR_EL2, FPSIMD/SVE support for nested : : This series brings support for honoring the guest hypervisor's CPTR_EL2 : trap configuration when running a nested guest, along with support for : FPSIMD/SVE usage at L1 and L2. KVM: arm64: Allow the use of SVE+NV KVM: arm64: nv: Add additional trap setup for CPTR_EL2 KVM: arm64: nv: Add trap description for CPTR_EL2 KVM: arm64: nv: Add TCPAC/TTA to CPTR->CPACR conversion helper KVM: arm64: nv: Honor guest hypervisor's FP/SVE traps in CPTR_EL2 KVM: arm64: nv: Load guest FP state for ZCR_EL2 trap KVM: arm64: nv: Handle CPACR_EL1 traps KVM: arm64: Spin off helper for programming CPTR traps KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state KVM: arm64: nv: Use guest hypervisor's max VL when running nested guest KVM: arm64: nv: Save guest's ZCR_EL2 when in hyp context KVM: arm64: nv: Load guest hyp's ZCR into EL1 state KVM: arm64: nv: Handle ZCR_EL2 traps KVM: arm64: nv: Forward SVE traps to guest hypervisor KVM: arm64: nv: Forward FP/ASIMD traps to guest hypervisor Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * | KVM: arm64: nv: Load guest hyp's ZCR into EL1 stateOliver Upton2024-06-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Load the guest hypervisor's ZCR_EL2 into the corresponding EL1 register when restoring SVE state, as ZCR_EL2 affects the VL in the hypervisor context. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240620164653.1130714-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * | KVM: arm64: nv: Handle ZCR_EL2 trapsOliver Upton2024-06-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike other SVE-related registers, ZCR_EL2 takes a sysreg trap to EL2 when HCR_EL2.NV = 1. KVM still needs to honor the guest hypervisor's trap configuration, which expects an SVE trap (i.e. ESR_EL2.EC = 0x19) when CPTR traps are enabled for the vCPU's current context. Otherwise, if the guest hypervisor has traps disabled, emulate the access by mapping the requested VL into ZCR_EL1. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240620164653.1130714-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* | | Merge branch kvm-arm64/ctr-el0 into kvmarm/nextOliver Upton2024-07-141-4/+22
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/ctr-el0: : Support for user changes to CTR_EL0, courtesy of Sebastian Ott : : Allow userspace to change the guest-visible value of CTR_EL0 for a VM, : so long as the requested value represents a subset of features supported : by hardware. In other words, prevent the VMM from over-promising the : capabilities of hardware. : : Make this happen by fitting CTR_EL0 into the existing infrastructure for : feature ID registers. KVM: selftests: Assert that MPIDR_EL1 is unchanged across vCPU reset KVM: arm64: nv: Unfudge ID_AA64PFR0_EL1 masking KVM: selftests: arm64: Test writes to CTR_EL0 KVM: arm64: rename functions for invariant sys regs KVM: arm64: show writable masks for feature registers KVM: arm64: Treat CTR_EL0 as a VM feature ID register KVM: arm64: unify code to prepare traps KVM: arm64: nv: Use accessors for modifying ID registers KVM: arm64: Add helper for writing ID regs KVM: arm64: Use read-only helper for reading VM ID registers KVM: arm64: Make idregs debugfs iterator search sysreg table directly KVM: arm64: Get sys_reg encoding from descriptor in idregs_debug_show() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * | | KVM: arm64: Treat CTR_EL0 as a VM feature ID registerSebastian Ott2024-06-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CTR_EL0 is currently handled as an invariant register, thus guests will be presented with the host value of that register. Add emulation for CTR_EL0 based on a per VM value. Userspace can switch off DIC and IDC bits and reduce DminLine and IminLine sizes. Naturally, ensure CTR_EL0 is trapped (HCR_EL2.TID2=1) any time that a VM's CTR_EL0 differs from hardware. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * | | KVM: arm64: unify code to prepare trapsSebastian Ott2024-06-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 2 functions to calculate traps via HCR_EL2: * kvm_init_sysreg() called via KVM_RUN (before the 1st run or when the pid changes) * vcpu_reset_hcr() called via KVM_ARM_VCPU_INIT To unify these 2 and to support traps that are dependent on the ID register configuration, move the code from vcpu_reset_hcr() to sys_regs.c and call it via kvm_init_sysreg(). We still have to keep the non-FWB handling stuff in vcpu_reset_hcr(). Also the initialization with HCR_GUEST_FLAGS is kept there but guarded by !vcpu_has_run_once() to ensure that previous calculated values don't get overwritten. While at it rename kvm_init_sysreg() to kvm_calculate_traps() to better reflect what it's doing. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * | | KVM: arm64: nv: Use accessors for modifying ID registersOliver Upton2024-06-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the interest of abstracting away the underlying storage of feature ID registers, rework the nested code to go through the accessors instead of directly iterating the id_regs array. This means we now lose the property that ID registers unknown to the nested code get zeroed, but we really ought to be handling those explicitly going forward. Link: https://lore.kernel.org/r/20240619174036.483943-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * | | KVM: arm64: Add helper for writing ID regsOliver Upton2024-06-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the remaining usage of IDREG() with a new helper for setting the value of a feature ID register, with the benefit of cramming in some extra sanity checks. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * | | KVM: arm64: Use read-only helper for reading VM ID registersOliver Upton2024-06-201-1/+15
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IDREG() expands to the storage of a particular ID reg, which can be useful for handling both reads and writes. However, outside of a select few situations, the ID registers should be considered read only. Replace current readers with a new macro that expands to the value of the field rather than the field itself. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* | | KVM: arm64: nv: Support multiple nested Stage-2 mmu structuresMarc Zyngier2024-06-191-0/+36
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Stage-2 mmu data structures for virtual EL2 and for nested guests. We don't yet populate shadow Stage-2 page tables, but we now have a framework for getting to a shadow Stage-2 pgd. We allocate twice the number of vcpus as Stage-2 mmu structures because that's sufficient for each vcpu running two translation regimes without having to flush the Stage-2 page tables. Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
* | KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVMFuad Tabba2024-06-041-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | Now that we have introduced finalize_init_hyp_mode(), lets consolidate the initializing of the host_data fpsimd_state and sve state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240603122852.3923848-8-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVMFuad Tabba2024-06-041-0/+17
|/ | | | | | | | | | | | | | | | Protected mode needs to maintain (save/restore) the host's sve state, rather than relying on the host kernel to do that. This is to avoid leaking information to the host about guests and the type of operations they are performing. As a first step towards that, allocate memory mapped at hyp, per cpu, for the host sve state. The following patch will use this memory to save/restore the host state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
* Merge branch kvm-arm64/mpidr-reset into kvmarm-master/nextMarc Zyngier2024-05-091-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/mpidr-reset: : . : Fixes for CLIDR_EL1 and MPIDR_EL1 being accidentally mutable across : a vcpu reset, courtesy of Oliver. From the cover letter: : : "For VM-wide feature ID registers we ensure they get initialized once for : the lifetime of a VM. On the other hand, vCPU-local feature ID registers : get re-initialized on every vCPU reset, potentially clobbering the : values userspace set up. : : MPIDR_EL1 and CLIDR_EL1 are the only registers in this space that we : allow userspace to modify for now. Clobbering the value of MPIDR_EL1 has : some disastrous side effects as the compressed index used by the : MPIDR-to-vCPU lookup table assumes MPIDR_EL1 is immutable after KVM_RUN. : : Series + reproducer test case to address the problem of KVM wiping out : userspace changes to these registers. Note that there are still some : differences between VM and vCPU scoped feature ID registers from the : perspective of userspace. We do not allow the value of VM-scope : registers to change after KVM_RUN, but vCPU registers remain mutable." : . KVM: selftests: arm64: Test vCPU-scoped feature ID registers KVM: selftests: arm64: Test that feature ID regs survive a reset KVM: selftests: arm64: Store expected register value in set_id_regs KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope KVM: arm64: Only reset vCPU-scoped feature ID regs once KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs() KVM: arm64: Rename is_id_reg() to imply VM scope Signed-off-by: Marc Zyngier <maz@kernel.org>
| * KVM: arm64: Only reset vCPU-scoped feature ID regs onceOliver Upton2024-05-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The general expecation with feature ID registers is that they're 'reset' exactly once by KVM for the lifetime of a vCPU/VM, such that any userspace changes to the CPU features / identity are honored after a vCPU gets reset (e.g. PSCI_ON). KVM handles what it calls VM-scoped feature ID registers correctly, but feature ID registers local to a vCPU (CLIDR_EL1, MPIDR_EL1) get wiped after every reset. What's especially concerning is that a potentially-changing MPIDR_EL1 breaks MPIDR compression for indexing mpidr_data, as the mask of useful bits to build the index could change. This is absolutely no good. Avoid resetting vCPU feature ID registers more than once. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240502233529.1958459-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
* | Merge branch kvm-arm64/misc-6.10 into kvmarm-master/nextMarc Zyngier2024-05-081-13/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/misc-6.10: : . : Misc fixes and updates targeting 6.10 : : - Improve boot-time diagnostics when the sysreg tables : are not correctly sorted : : - Allow FFA_MSG_SEND_DIRECT_REQ in the FFA proxy : : - Fix duplicate XNX field in the ID_AA64MMFR1_EL1 : writeable mask : : - Allocate PPIs and SGIs outside of the vcpu structure, allowing : for smaller EL2 mapping and some flexibility in implementing : more or less than 32 private IRQs. : : - Use bitmap_gather() instead of its open-coded equivalent : : - Make protected mode use hVHE if available : : - Purge stale mpidr_data if a vcpu is created after the MPIDR : map has been created : . KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather() KVM: arm64: vgic: Allocate private interrupts on demand KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNX KVM: arm64: Remove FFA_MSG_SEND_DIRECT_REQ from the denylist KVM: arm64: Improve out-of-order sysreg table diagnostics Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather()Marc Zyngier2024-05-031-13/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | Linux 6.9 has introduced new bitmap manipulation helpers, with bitmap_gather() being of special interest, as it does exactly what kvm_mpidr_index() is already doing. Make the latter a wrapper around the former. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240502154247.3012042-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
* | Merge branch kvm-arm64/pkvm-6.10 into kvmarm-master/nextMarc Zyngier2024-05-031-7/+16
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/pkvm-6.10: (25 commits) : . : At last, a bunch of pKVM patches, courtesy of Fuad Tabba. : From the cover letter: : : "This series is a bit of a bombay-mix of patches we've been : carrying. There's no one overarching theme, but they do improve : the code by fixing existing bugs in pKVM, refactoring code to : make it more readable and easier to re-use for pKVM, or adding : functionality to the existing pKVM code upstream." : . KVM: arm64: Force injection of a data abort on NISV MMIO exit KVM: arm64: Restrict supported capabilities for protected VMs KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap() KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst KVM: arm64: Rename firmware pseudo-register documentation file KVM: arm64: Reformat/beautify PTP hypercall documentation KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit KVM: arm64: Introduce and use predicates that check for protected VMs KVM: arm64: Add is_pkvm_initialized() helper KVM: arm64: Simplify vgic-v3 hypercalls KVM: arm64: Move setting the page as dirty out of the critical section KVM: arm64: Change kvm_handle_mmio_return() return polarity KVM: arm64: Fix comment for __pkvm_vcpu_init_traps() KVM: arm64: Prevent kmemleak from accessing .hyp.data KVM: arm64: Do not map the host fpsimd state to hyp in pKVM KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE KVM: arm64: Support TLB invalidation in guest context KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE KVM: arm64: Check for PTE validity when checking for executable/cacheable KVM: arm64: Avoid BUG-ing from the host abort path ... Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Introduce and use predicates that check for protected VMsFuad Tabba2024-05-011-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to determine whether or not a VM or vcpu are protected, introduce helpers to query this state. While at it, use the vcpu helper to check vcpus protected state instead of the kvm one. Co-authored-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-19-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Do not map the host fpsimd state to hyp in pKVMFuad Tabba2024-05-011-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pKVM maintains its own state at EL2 for tracking the host fpsimd state. Therefore, no need to map and share the host's view with it. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-12-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Refactor checks for FP state ownershipFuad Tabba2024-05-011-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid direct comparison against the fp_owner enum, add a new function that performs the check, host_owns_fp_regs(), to complement the existing guest_owns_fp_regs(). To check for fpsimd state ownership, use the helpers instead of directly using the enums. No functional change intended. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Move guest_owns_fp_regs() to increase its scopeFuad Tabba2024-05-011-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | guest_owns_fp_regs() will be used to check fpsimd state ownership across kvm/arm64. Therefore, move it to kvm_host.h to widen its scope. Moreover, the host state is not per-vcpu anymore, the vcpu parameter isn't used, so remove it as well. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
* | | Merge branch kvm-arm64/nv-eret-pauth into kvmarm-master/nextMarc Zyngier2024-05-031-3/+23
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/nv-eret-pauth: : . : Add NV support for the ERETAA/ERETAB instructions. From the cover letter: : : "Although the current upstream NV support has *some* support for : correctly emulating ERET, that support is only partial as it doesn't : support the ERETAA and ERETAB variants. : : Supporting these instructions was cast aside for a long time as it : involves implementing some form of PAuth emulation, something I wasn't : overly keen on. But I have reached a point where enough of the : infrastructure is there that it actually makes sense. So here it is!" : . KVM: arm64: nv: Work around lack of pauth support in old toolchains KVM: arm64: Drop trapping of PAuth instructions/keys KVM: arm64: nv: Advertise support for PAuth KVM: arm64: nv: Handle ERETA[AB] instructions KVM: arm64: nv: Add emulation for ERETAx instructions KVM: arm64: nv: Add kvm_has_pauth() helper KVM: arm64: nv: Reinject PAC exceptions caused by HCR_EL2.API==0 KVM: arm64: nv: Handle HCR_EL2.{API,APK} independently KVM: arm64: nv: Honor HFGITR_EL2.ERET being set KVM: arm64: nv: Fast-track 'InHost' exception returns KVM: arm64: nv: Add trap forwarding for ERET and SMC KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2 KVM: arm64: nv: Drop VCPU_HYP_CONTEXT flag KVM: arm64: Constraint PAuth support to consistent implementations KVM: arm64: Add helpers for ESR_ELx_ERET_ISS_ERET* KVM: arm64: Harden __ctxt_sys_reg() against out-of-range values Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: nv: Add kvm_has_pauth() helperMarc Zyngier2024-04-201-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pointer Authentication comes in many flavors, and a faithful emulation relies on correctly handling the flavour implemented by the HW. For this, provide a new kvm_has_pauth() that checks whether we expose to the guest a particular level of support. This checks across all 3 possible authentication algorithms (Q5, Q3 and IMPDEF). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240419102935.1935571-12-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: nv: Drop VCPU_HYP_CONTEXT flagMarc Zyngier2024-04-201-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It has become obvious that HCR_EL2.NV serves the exact same use as VCPU_HYP_CONTEXT, only in an architectural way. So just drop the flag for good. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240419102935.1935571-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
| * | KVM: arm64: Harden __ctxt_sys_reg() against out-of-range valuesMarc Zyngier2024-04-201-1/+8
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The unsuspecting kernel tinkerer can be easily confused into writing something that looks like this: ikey.lo = __vcpu_sys_reg(vcpu, SYS_APIAKEYLO_EL1); which seems vaguely sensible, until you realise that the second parameter is the encoding of a sysreg, and not the index into the vcpu sysreg file... Debugging what happens in this case is an interesting exercise in head<->wall interactions. As they often say: "Any resemblance to actual persons, living or dead, or actual events is purely coincidental". In order to save people's time, add some compile-time hardening that will at least weed out the "stupidly out of range" values. This will *not* catch anything that isn't a compile-time constant. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240419102935.1935571-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: Exclude FP ownership from kvm_vcpu_archMarc Zyngier2024-04-121-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | In retrospect, it is fairly obvious that the FP state ownership is only meaningful for a given CPU, and that locating this information in the vcpu was just a mistake. Move the ownership tracking into the host data structure, and rename it from fp_state to fp_owner, which is a better description (name suggested by Mark Brown). Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: Exclude host_fpsimd_state pointer from kvm_vcpu_archMarc Zyngier2024-04-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | As the name of the field indicates, host_fpsimd_state is strictly a host piece of data, and we reset this pointer on each PID change. So let's move it where it belongs, and set it at load-time. Although this is slightly more often, it is a well defined life-cycle which matches other pieces of data. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: Exclude mdcr_el2_host from kvm_vcpu_archMarc Zyngier2024-04-121-3/+2
| | | | | | | | | | | | | | | | | | As for the rest of the host debug state, the host copy of mdcr_el2 has little to do in the vcpu, and is better placed in the host_data structure. Reviewed-by : Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: Exclude host_debug_data from vcpu_archMarc Zyngier2024-04-121-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | Keeping host_debug_state on a per-vcpu basis is completely pointless. The lifetime of this data is only that of the inner run-loop, which means it is never accessed outside of the core EL2 code. Move the structure into kvm_host_data, and save over 500 bytes per vcpu. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: Add accessor for per-CPU stateMarc Zyngier2024-04-121-0/+37
|/ | | | | | | | | | In order to facilitate the introduction of new per-CPU state, add a new host_data_ptr() helped that hides some of the per-CPU verbosity, and make it easier to move that state around in the future. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2024-03-151-1/+98
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Paolo Bonzini: "S390: - Changes to FPU handling came in via the main s390 pull request - Only deliver to the guest the SCLP events that userspace has requested - More virtual vs physical address fixes (only a cleanup since virtual and physical address spaces are currently the same) - Fix selftests undefined behavior x86: - Fix a restriction that the guest can't program a PMU event whose encoding matches an architectural event that isn't included in the guest CPUID. The enumeration of an architectural event only says that if a CPU supports an architectural event, then the event can be programmed *using the architectural encoding*. The enumeration does NOT say anything about the encoding when the CPU doesn't report support the event *in general*. It might support it, and it might support it using the same encoding that made it into the architectural PMU spec - Fix a variety of bugs in KVM's emulation of RDPMC (more details on individual commits) and add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID and therefore are easier to validate with selftests than with custom guests (aka kvm-unit-tests) - Zero out PMU state on AMD if the virtual PMU is disabled, it does not cause any bug but it wastes time in various cases where KVM would check if a PMC event needs to be synthesized - Optimize triggering of emulated events, with a nice ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit - Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace with an internal error exit code - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot - Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM doesn't support yielding in the middle of processing a zap, and 1GiB granularity resulted in multi-millisecond lags that are quite impolite for CONFIG_PREEMPT kernels - Allocate write-tracking metadata on-demand to avoid the memory overhead when a kernel is built with i915 virtualization support but the workloads use neither shadow paging nor i915 virtualization - Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives - Fix the debugregs ABI for 32-bit KVM - Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit, which allowed some optimization for both Intel and AMD - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, causing extra unnecessary work - Cleanup the logic for checking if the currently loaded vCPU is in-kernel - Harden against underflowing the active mmu_notifier invalidation count, so that "bad" invalidations (usually due to bugs elsehwere in the kernel) are detected earlier and are less likely to hang the kernel x86 Xen emulation: - Overlay pages can now be cached based on host virtual address, instead of guest physical addresses. This removes the need to reconfigure and invalidate the cache if the guest changes the gpa but the underlying host virtual address remains the same - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior) - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs RISC-V: - Support exception and interrupt handling in selftests - New self test for RISC-V architectural timer (Sstc extension) - New extension support (Ztso, Zacas) - Support userspace emulation of random number seed CSRs ARM: - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests LoongArch: - Set reserved bits as zero in CPUCFG - Start SW timer only when vcpu is blocking - Do not restart SW timer when it is expired - Remove unnecessary CSR register saving during enter guest - Misc cleanups and fixes as usual Generic: - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically always true on all architectures except MIPS (where Kconfig determines the available depending on CPU capabilities). It is replaced either by an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM) everywhere else - Factor common "select" statements in common code instead of requiring each architecture to specify it - Remove thoroughly obsolete APIs from the uapi headers - Move architecture-dependent stuff to uapi/asm/kvm.h - Always flush the async page fault workqueue when a work item is being removed, especially during vCPU destruction, to ensure that there are no workers running in KVM code when all references to KVM-the-module are gone, i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded - Grab a reference to the VM's mm_struct in the async #PF worker itself instead of gifting the worker a reference, so that there's no need to remember to *conditionally* clean up after the worker Selftests: - Reduce boilerplate especially when utilize selftest TAP infrastructure - Add basic smoke tests for SEV and SEV-ES, along with a pile of library support for handling private/encrypted/protected memory - Fix benign bugs where tests neglect to close() guest_memfd files" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) selftests: kvm: remove meaningless assignments in Makefiles KVM: riscv: selftests: Add Zacas extension to get-reg-list test RISC-V: KVM: Allow Zacas extension for Guest/VM KVM: riscv: selftests: Add Ztso extension to get-reg-list test RISC-V: KVM: Allow Ztso extension for Guest/VM RISC-V: KVM: Forward SEED CSR access to user space KVM: riscv: selftests: Add sstc timer test KVM: riscv: selftests: Change vcpu_has_ext to a common function KVM: riscv: selftests: Add guest helper to get vcpu id KVM: riscv: selftests: Add exception handling support LoongArch: KVM: Remove unnecessary CSR register saving during enter guest LoongArch: KVM: Do not restart SW timer when it is expired LoongArch: KVM: Start SW timer only when vcpu is blocking LoongArch: KVM: Set reserved bits as zero in CPUCFG KVM: selftests: Explicitly close guest_memfd files in some gmem tests KVM: x86/xen: fix recursive deadlock in timer injection KVM: pfncache: simplify locking and make more self-contained KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled KVM: x86/xen: improve accuracy of Xen timers ...
| * KVM: arm64: Don't initialize idreg debugfs w/ preemption disabledOliver Upton2024-02-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing KVM with DEBUG_ATOMIC_SLEEP enabled doesn't get far before hitting the first splat: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 13062, name: vgic_lpi_stress preempt_count: 1, expected: 0 2 locks held by vgic_lpi_stress/13062: #0: ffff080084553240 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xc0/0x13f0 #1: ffff800080485f08 (&kvm->arch.config_lock){+.+.}-{3:3}, at: kvm_arch_vcpu_ioctl+0xd60/0x1788 CPU: 19 PID: 13062 Comm: vgic_lpi_stress Tainted: G W O 6.8.0-dbg-DEV #1 Call trace: dump_backtrace+0xf8/0x148 show_stack+0x20/0x38 dump_stack_lvl+0xb4/0xf8 dump_stack+0x18/0x40 __might_resched+0x248/0x2a0 __might_sleep+0x50/0x88 down_write+0x30/0x150 start_creating+0x90/0x1a0 __debugfs_create_file+0x5c/0x1b0 debugfs_create_file+0x34/0x48 kvm_reset_sys_regs+0x120/0x1e8 kvm_reset_vcpu+0x148/0x270 kvm_arch_vcpu_ioctl+0xddc/0x1788 kvm_vcpu_ioctl+0xb6c/0x13f0 __arm64_sys_ioctl+0x98/0xd8 invoke_syscall+0x48/0x108 el0_svc_common+0xb4/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x54/0x128 el0t_64_sync_handler+0x68/0xc0 el0t_64_sync+0x1a8/0x1b0 kvm_reset_vcpu() disables preemption as it needs to unload vCPU state from the CPU to twiddle with it, which subsequently explodes when taking the parent inode's rwsem while creating the idreg debugfs file. Fix it by moving the initialization to kvm_arch_create_vm_debugfs(). Fixes: 891766581dea ("KVM: arm64: Add debugfs file for guest's ID registers") Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240227094115.1723330-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * KVM: arm64: Add debugfs file for guest's ID registersMarc Zyngier2024-02-191-0/+3
| | | | | | | | | | | | | | | | | | Debugging ID register setup can be a complicated affair. Give the kernel hacker a way to dump that state in an easy to parse way. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-27-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * KVM: arm64: Make FEAT_MOPS UNDEF if not advertised to the guestMarc Zyngier2024-02-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | We unconditionally enable FEAT_MOPS, which is obviously wrong. So let's only do that when it is advertised to the guest. Which means we need to rely on a per-vcpu HCRX_EL2 shadow register. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20240214131827.2856277-25-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * KVM: arm64: Move existing feature disabling over to FGU infrastructureMarc Zyngier2024-02-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | We already trap a bunch of existing features for the purpose of disabling them (MAIR2, POR, ACCDATA, SME...). Let's move them over to our brand new FGU infrastructure. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-20-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * KVM: arm64: Add Fine-Grained UNDEF tracking informationMarc Zyngier2024-02-191-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to efficiently handle system register access being disabled, and this resulting in an UNDEF exception being injected, we introduce the (slightly dubious) concept of Fine-Grained UNDEF, modeled after the architectural Fine-Grained Traps. For each FGT group, we keep a 64 bit word that has the exact same bit assignment as the corresponding FGT register, where a 1 indicates that trapping this register should result in an UNDEF exception being reinjected. So far, nothing populates this information, nor sets the corresponding trap bits. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-18-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * KVM: arm64: Register AArch64 system register entries with the sysreg xarrayMarc Zyngier2024-02-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to reduce the number of lookups that we have to perform when handling a sysreg, register each AArch64 sysreg descriptor with the global xarray. The index of the descriptor is stored as a 10 bit field in the data word. Subsequent patches will retrieve and use the stored index. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| * KVM: arm64: nv: Add sanitising to VNCR-backed sysregsMarc Zyngier2024-02-191-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VNCR-backed "registers" are actually only memory. Which means that there is zero control over what the guest can write, and that it is the hypervisor's job to actually sanitise the content of the backing store. Yeah, this is fun. In order to preserve some form of sanity, add a repainting mechanism that makes use of a per-VM set of RES0/RES1 masks, one pair per VNCR register. These masks get applied on access to the backing store via __vcpu_sys_reg(), ensuring that the state that is consumed by KVM is correct. So far, nothing populates these masks, but stay tuned. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20240214131827.2856277-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>