summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* thunderbolt: Prefix CL state related log messages with "CLx: "Mika Westerberg2023-06-091-4/+4
| | | | | | | | This makes it easier to spot from the logs and follows what we do with the TMU code already. We also log enabling/disabling CL states using the tb_sw_dbg() instead of tb_port_dbg(). Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Prefix TMU post time log message with "TMU: "Mika Westerberg2023-06-091-1/+1
| | | | | | | Following what we do with other messages in this file. No functional changes. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Do not call CLx functions from TMU codeMika Westerberg2023-06-091-23/+0
| | | | | | | | There is really no need to call any of the CLx functions in the TMU code so remove all these checks. This makes the TMU enable/disable flows easier to follow as well. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Check for first depth router in tb.cMika Westerberg2023-06-092-16/+16
| | | | | | | | | | | Currently tb_switch_clx_enable() enables CL states only for the first depth router. This is something we may want to change in the future and in addition it is not visible from the calling path at all. For this reason do the check in the tb.c so it is immediately visible that we only do this for the first depth router. Fix the kernel-docs accordingly. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Switch CL states from enum to a bitmaskMika Westerberg2023-06-094-119/+113
| | | | | | | | | This is more natural and follows the hardware register layout better. This makes it easier to see which CL states we enable (even though they should be enabled together). Rename 'clx_mask' to 'clx' everywhere as this is now always bitmask. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Move CLx enabling into tb_enable_clx()Mika Westerberg2023-06-091-17/+17
| | | | | | | This avoids some duplication and makes the flow slightly easier to understand. Also follows what we do in tb_enable_tmu(). Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Get rid of __tb_switch_[en|dis]able_clx()Mika Westerberg2023-06-091-49/+42
| | | | | | | | | No need to have separate functions for these so fold them into tb_switch_clx_enable() and tb_switch_clx_disable() accordingly. No functional changes. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Move CLx support functions into clx.cMika Westerberg2023-06-097-378/+381
| | | | | | | | | | There really don't belong to switch.c so move them into their own file. As we do this rename the functions to match the conventions used elsewhere in the driver. No functional changes. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Check valid TMU configuration in tb_switch_tmu_configure()Mika Westerberg2023-06-093-10/+14
| | | | | | | Instead of at enable time we can do this already in tb_switch_tmu_configure(). Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Move tb_enable_tmu() close to other TMU functionsMika Westerberg2023-06-091-29/+29
| | | | | | This makes the code easier to follow. No functional changes. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Move TMU configuration to tb_enable_tmu()Mika Westerberg2023-06-091-20/+10
| | | | | | | | | There is no need to duplicate the code the enables TMU. Also update the comment to better explain why we do this in the first place. No functional changes. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Get rid of tb_switch_enable_tmu_1st_child()Mika Westerberg2023-06-093-40/+34
| | | | | | | | This is better to be part of the software connection manager flows in tb.c. Also name the new function tb_increase_tmu_accuracy() to match what it actually does. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Rework Titan Ridge TMU objection disable functionMika Westerberg2023-06-091-14/+10
| | | | | | | | | | | | | | Now this is split into two with one having a misleading name (tb_switch_tmu_unidirectional_enable()). Make this easier to read, rename and consolidate the two functions into one with name that explains what it actually does. Use the two constants as well that were added but never used to make it clear which bits are being set. No functional changes. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Drop useless 'unidirectional' parameter from ↵Mika Westerberg2023-06-093-14/+9
| | | | | | | | | | | | tb_switch_tmu_is_enabled() There is no point passing it as we already have a field for that. While there clean up the kernel-doc of things that do not really belong to the API documentation (these can be figured out from the spec itself). No functional changes. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Fix a couple of style issues in TMU codeMika Westerberg2023-06-091-12/+11
| | | | | | | Drop extra empty line and get rid of the '__' in function names. No functional changes. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Introduce tb_xdomain_downstream_port()Mika Westerberg2023-06-092-9/+18
| | | | | | | In the same way we did for the routers add a function that returns the parent routers downstream facing port for XDomain devices. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* thunderbolt: Introduce tb_switch_downstream_port()Gil Fine2023-06-097-55/+53
| | | | | | | | | | | | | Introduce tb_switch_downstream_port() helper function that returns the downstream port of a parent switch that is connected to the upstream port of specified switch. From now on, we use it all across the driver where applicable. While there fix a whitespace in comment and rename 'downstream' to 'down' to be consistent with the rest of the driver. Signed-off-by: Gil Fine <gil.fine@intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* Merge branch 'thunderbolt/fixes' into thunderbolt/nextMika Westerberg2023-06-094-13/+25
|\ | | | | | | | | | | We need Thunderbolt/USB4 fixes here as well. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
| * thunderbolt: Mask ring interrupt on Intel hardware as wellMika Westerberg2023-05-311-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When resuming from system sleep states the driver issues following warning on Intel hardware: thunderbolt 0000:07:00.0: interrupt for TX ring 0 is already enabled The reason for this is that the commit in question did not mask the ring interrupt on Intel hardware leaving the interrupt active. Fix this by masking it also in Intel hardware. Reported-by: beld zhang <beldzhang@gmail.com> Tested-by: beld zhang <beldzhang@gmail.com> Closes: https://lore.kernel.org/linux-usb/ZHKW5NeabmfhgLbY@debian.me/ Fixes: c4af8e3fecd0 ("thunderbolt: Clear registers properly when auto clear isn't in use") Cc: stable@vger.kernel.org Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
| * thunderbolt: Do not touch CL state configuration during discoveryMika Westerberg2023-05-291-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the boot firmware has already established tunnels, especially ones that have special requirements from the link such as DisplayPort, we should not blindly enable CL states (nor change the TMU configuration). Otherwise the existing tunnels may not work as expected. For this reason, skip the CL state enabling when we go over the existing topology. This will also keep the TMU settings untouched because we do not change the TMU configuration when CL states are not enabled. Reported-by: Koba Ko <koba.ko@canonical.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7831 Cc: stable@vger.kernel.org # v6.0+ Acked-By: Yehezkel Bernat <YehezkelShB@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
| * thunderbolt: Increase DisplayPort Connection Manager handshake timeoutMika Westerberg2023-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that when plugging in VGA cable through USB-C to VGA/DVI dongle the Connection Manager handshake can take longer time, at least on Intel Titan Ridge based docks such as Dell WD91TB. This leads to following error in the dmesg: thunderbolt 0000:00:0d.3: 3:10: DP tunnel activation failed, aborting and the display stays blank (because we failed to establish the tunnel). For this reason increase the timeout to 3s. Reported-by: Koba Ko <koba.ko@canonical.com> Cc: stable@vger.kernel.org Acked-By: Yehezkel Bernat <YehezkelShB@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
| * thunderbolt: dma_test: Use correct value for absent rings when creating pathsMika Westerberg2023-05-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | Both tb_xdomain_enable_paths() and tb_xdomain_disable_paths() expect -1, not 0, if the corresponding ring is not needed. For this reason change the driver to use correct value for the rings that are not needed. Fixes: 180b0689425c ("thunderbolt: Allow multiple DMA tunnels over a single XDomain connection") Cc: stable@vger.kernel.org Reported-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* | thunderbolt: Log DisplayPort adapter rate and lanes on discoveryMika Westerberg2023-05-301-0/+43
| | | | | | | | | | | | | | This may be helpful when debugging possible issues around DisplayPort port tunneling. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* | thunderbolt: Drop retimer vendor checkMika Westerberg2023-05-241-6/+0
| | | | | | | | | | | | | | This is not needed anymore as we already handle unknown vendor in NVM functions. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* | thunderbolt: dma_test: Update MODULE_DESCRIPTIONMika Westerberg2023-05-241-1/+1
| | | | | | | | | | | | | | Make the description match the core driver and the networking with Thunderbolt/USB4 prefix. No functional changes. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* | thunderbolt: Add MODULE_DESCRIPTIONMika Westerberg2023-05-241-0/+1
| | | | | | | | | | | | Add description about the driver to the module. No functional changes. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* | thunderbolt: Allow specifying custom credits for DMA tunnelsMika Westerberg2023-05-241-4/+9
| | | | | | | | | | | | | | The default ones should be find but this allows the user to tweak the credits to get more performance out of the P2P connection. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* | thunderbolt: Add debug log for link controller power quirkMika Westerberg2023-05-241-0/+1
| | | | | | | | | | | | | | Add a debug log to this quirk as well so we can see what quirks have been applied when debugging. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* | thunderbolt: Log function name of the called quirkMika Westerberg2023-05-241-0/+1
| | | | | | | | | | | | This is useful when debugging whether a quirk has been matched or not. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* | thunderbolt: Check for ring 0 in tb_tunnel_alloc_dma()Mika Westerberg2023-05-241-0/+4
|/ | | | | | | | Ring 0 cannot be used for anything else than control channel messages. For this reason add a check to tb_tunnel_alloc_dma() and fail if someone tries to do that. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
* Linux 6.4-rc3v6.4-rc3Linus Torvalds2023-05-211-1/+1
|
* Merge tag 'uml-for-linus-6.4-rc3' of ↵Linus Torvalds2023-05-215-7/+23
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML fix from Richard Weinberger: - Fix modular build for UML watchdog * tag 'uml-for-linus-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: harddog: fix modular build
| * um: harddog: fix modular buildJohannes Berg2023-05-105-7/+23
| | | | | | | | | | | | | | | | | | | | | | Since we no longer (want to) export any libc symbols the _user portions of any drivers need to be built into image rather than the module. I missed this for the watchdog. Fix the watchdog accordingly. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2023-05-2113-66/+129
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Paolo Bonzini: "ARM: - Plug a race in the stage-2 mapping code where the IPA and the PA would end up being out of sync - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...) - FP/SVE/SME documentation update, in the hope that this field becomes clearer... - Add workaround for Apple SEIS brokenness to a new SoC - Random comment fixes x86: - add MSR_IA32_TSX_CTRL into msrs_to_save - fixes for XCR0 handling in SGX enclaves Generic: - Fix vcpu_array[0] races - Fix race between starting a VM and 'reboot -f'" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM) KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATE KVM: Fix vcpu_array[0] races KVM: VMX: Fix header file dependency of asm/vmx.h KVM: Don't enable hardware after a restart/shutdown is initiated KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations KVM: arm64: Clarify host SME state management KVM: arm64: Restructure check for SVE support in FP trap handler KVM: arm64: Document check for TIF_FOREIGN_FPSTATE KVM: arm64: Fix repeated words in comments KVM: arm64: Constify start/end/phys fields of the pgtable walker data KVM: arm64: Infer PA offset from VA in hyp map walker KVM: arm64: Infer the PA offset from IPA in stage-2 map walker KVM: arm64: Use the bitmap API to allocate bitmaps KVM: arm64: Slightly optimize flush_context()
| * | KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_saveMingwei Zhang2023-05-211-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add MSR_IA32_TSX_CTRL into msrs_to_save[] to explicitly tell userspace to save/restore the register value during migration. Missing this may cause userspace that relies on KVM ioctl(KVM_GET_MSR_INDEX_LIST) fail to port the value to the target VM. In addition, there is no need to add MSR_IA32_TSX_CTRL when ARCH_CAP_TSX_CTRL_MSR is not supported in kvm_get_arch_capabilities(). So add the checking in kvm_probe_msr_to_save(). Fixes: c11f83e0626b ("KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality") Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20230509032348.1153070-1-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)Sean Christopherson2023-05-211-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop KVM's manipulation of guest's CPUID.0x12.1 ECX and EDX, i.e. the allowed XFRM of SGX enclaves, now that KVM explicitly checks the guest's allowed XCR0 when emulating ECREATE. Note, this could theoretically break a setup where userspace advertises a "bad" XFRM and relies on KVM to provide a sane CPUID model, but QEMU is the only known user of KVM SGX, and QEMU explicitly sets the SGX CPUID XFRM subleaf based on the guest's XCR0. Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230503160838.3412617-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATESean Christopherson2023-05-211-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Explicitly check the vCPU's supported XCR0 when determining whether or not the XFRM for ECREATE is valid. Checking CPUID works because KVM updates guest CPUID.0x12.1 to restrict the leaf to a subset of the guest's allowed XCR0, but that is rather subtle and KVM should not modify guest CPUID except for modeling true runtime behavior (allowed XFRM is most definitely not "runtime" behavior). Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230503160838.3412617-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: Fix vcpu_array[0] racesMichal Luczaj2023-05-191-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In kvm_vm_ioctl_create_vcpu(), add vcpu to vcpu_array iff it's safe to access vcpu via kvm_get_vcpu() and kvm_for_each_vcpu(), i.e. when there's no failure path requiring vcpu removal and destruction. Such order is important because vcpu_array accessors may end up referencing vcpu at vcpu_array[0] even before online_vcpus is set to 1. When online_vcpus=0, any call to kvm_get_vcpu() goes through array_index_nospec() and ends with an attempt to xa_load(vcpu_array, 0): int num_vcpus = atomic_read(&kvm->online_vcpus); i = array_index_nospec(i, num_vcpus); return xa_load(&kvm->vcpu_array, i); Similarly, when online_vcpus=0, a kvm_for_each_vcpu() does not iterate over an "empty" range, but actually [0, ULONG_MAX]: xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, \ (atomic_read(&kvm->online_vcpus) - 1)) In both cases, such online_vcpus=0 edge case, even if leading to unnecessary calls to XArray API, should not be an issue; requesting unpopulated indexes/ranges is handled by xa_load() and xa_for_each_range(). However, this means that when the first vCPU is created and inserted in vcpu_array *and* before online_vcpus is incremented, code calling kvm_get_vcpu()/kvm_for_each_vcpu() already has access to that first vCPU. This should not pose a problem assuming that once a vcpu is stored in vcpu_array, it will remain there, but that's not the case: kvm_vm_ioctl_create_vcpu() first inserts to vcpu_array, then requests a file descriptor. If create_vcpu_fd() fails, newly inserted vcpu is removed from the vcpu_array, then destroyed: vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus); r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT); kvm_get_kvm(kvm); r = create_vcpu_fd(vcpu); if (r < 0) { xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx); kvm_put_kvm_no_destroy(kvm); goto unlock_vcpu_destroy; } atomic_inc(&kvm->online_vcpus); This results in a possible race condition when a reference to a vcpu is acquired (via kvm_get_vcpu() or kvm_for_each_vcpu()) moments before said vcpu is destroyed. Signed-off-by: Michal Luczaj <mhal@rbox.co> Message-Id: <20230510140410.1093987-2-mhal@rbox.co> Cc: stable@vger.kernel.org Fixes: c5b077549136 ("KVM: Convert the kvm->vcpus array to a xarray", 2021-12-08) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: VMX: Fix header file dependency of asm/vmx.hJacob Xu2023-05-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Include a definition of WARN_ON_ONCE() before using it. Fixes: bb1fcc70d98f ("KVM: nVMX: Allow L1 to use 5-level page walks for nested EPT") Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jacob Xu <jacobhxu@google.com> [reworded commit message; changed <asm/bug.h> to <linux/bug.h>] Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220225012959.1554168-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: Don't enable hardware after a restart/shutdown is initiatedSean Christopherson2023-05-191-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reject hardware enabling, i.e. VM creation, if a restart/shutdown has been initiated to avoid re-enabling hardware between kvm_reboot() and machine_{halt,power_off,restart}(). The restart case is especially problematic (for x86) as enabling VMX (or clearing GIF in KVM_RUN on SVM) blocks INIT, which results in the restart/reboot hanging as BIOS is unable to wake and rendezvous with APs. Note, this bug, and the original issue that motivated the addition of kvm_reboot(), is effectively limited to a forced reboot, e.g. `reboot -f`. In a "normal" reboot, userspace will gracefully teardown userspace before triggering the kernel reboot (modulo bugs, errors, etc), i.e. any process that might do ioctl(KVM_CREATE_VM) is long gone. Fixes: 8e1c18157d87 ("KVM: VMX: Disable VMX when system shutdown") Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20230512233127.804012-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdownSean Christopherson2023-05-191-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use syscore_ops.shutdown to disable hardware virtualization during a reboot instead of using the dedicated reboot_notifier so that KVM disables virtualization _after_ system_state has been updated. This will allow fixing a race in KVM's handling of a forced reboot where KVM can end up enabling hardware virtualization between kernel_restart_prepare() and machine_restart(). Rename KVM's hook to match the syscore op to avoid any possible confusion from wiring up a "reboot" helper to a "shutdown" hook (neither "shutdown nor "reboot" is completely accurate as the hook handles both). Opportunistically rewrite kvm_shutdown()'s comment to make it less VMX specific, and to explain why kvm_rebooting exists. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: kvmarm@lists.linux.dev Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: Anup Patel <anup@brainfault.org> Cc: Atish Patra <atishp@atishpatra.org> Cc: kvm-riscv@lists.infradead.org Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20230512233127.804012-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | Merge tag 'kvmarm-fixes-6.4-1' of ↵Paolo Bonzini2023-05-178-25/+76
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.4, take #1 - Plug a race in the stage-2 mapping code where the IPA and the PA would end up being out of sync - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...) - FP/SVE/SME documentation update, in the hope that this field becomes clearer... - Add workaround for the usual Apple SEIS brokenness - Random comment fixes
| | * \ Merge branch kvm-arm64/pgtable-fixes-6.4 into kvmarm-master/fixesMarc Zyngier2023-05-112-9/+33
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/pgtable-fixes-6.4: : . : Fixes for concurrent S2 mapping race from Oliver: : : "So it appears that there is a race between two parallel stage-2 map : walkers that could lead to mapping the incorrect PA for a given IPA, as : the IPA -> PA relationship picks up an unintended offset. This series : eliminates the problem by using the current IPA of the walk as the : source-of-truth regarding where we are in a map operation." : . KVM: arm64: Constify start/end/phys fields of the pgtable walker data KVM: arm64: Infer PA offset from VA in hyp map walker KVM: arm64: Infer the PA offset from IPA in stage-2 map walker Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | KVM: arm64: Constify start/end/phys fields of the pgtable walker dataMarc Zyngier2023-04-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we are revamping the way the pgtable walker evaluates some of the data, make it clear that we rely on somew of the fields to be constant across the lifetime of a walk. For this, flag the start, end and phys fields of the walk data as 'const', which will generate an error if we were to accidentally update these fields again. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | KVM: arm64: Infer PA offset from VA in hyp map walkerOliver Upton2023-04-211-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to the recently fixed stage-2 walker, the hyp map walker increments the PA and VA of a walk separately. Unlike stage-2, there is no bug here as the map walker has exclusive access to the stage-1 page tables. Nonetheless, in the interest of continuity throughout the page table code, tweak the hyp map walker to avoid incrementing the PA and instead use the VA as the authoritative source of how far along a table walk has gotten. Calculate the PA to use for a leaf PTE by adding the offset of the VA from the start of the walk to the starting PA. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230421071606.1603916-3-oliver.upton@linux.dev
| | | * | KVM: arm64: Infer the PA offset from IPA in stage-2 map walkerOliver Upton2023-04-212-4/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, the page table walker counted increments to the PA and IPA of a walk in two separate places. While the PA is incremented as soon as a leaf PTE is installed in stage2_map_walker_try_leaf(), the IPA is actually bumped in the generic table walker context. Critically, __kvm_pgtable_visit() rereads the PTE after the LEAF callback returns to work out if a table or leaf was installed, and only bumps the IPA for a leaf PTE. This arrangement worked fine when we handled faults behind the write lock, as the walker had exclusive access to the stage-2 page tables. However, commit 1577cb5823ce ("KVM: arm64: Handle stage-2 faults in parallel") started handling all stage-2 faults behind the read lock, opening up a race where a walker could increment the PA but not the IPA of a walk. Nothing good ensues, as the walker starts mapping with the incorrect IPA -> PA relationship. For example, assume that two vCPUs took a data abort on the same IPA. One observes that dirty logging is disabled, and the other observed that it is enabled: vCPU attempting PMD mapping vCPU attempting PTE mapping ====================================== ===================================== /* install PMD */ stage2_make_pte(ctx, leaf); data->phys += granule; /* replace PMD with a table */ stage2_try_break_pte(ctx, data->mmu); stage2_make_pte(ctx, table); /* table is observed */ ctx.old = READ_ONCE(*ptep); table = kvm_pte_table(ctx.old, level); /* * map walk continues w/o incrementing * IPA. */ __kvm_pgtable_walk(..., level + 1); Bring an end to the whole mess by using the IPA as the single source of truth for how far along a walk has gotten. Work out the correct PA to map by calculating the IPA offset from the beginning of the walk and add that to the starting physical address. Cc: stable@vger.kernel.org Fixes: 1577cb5823ce ("KVM: arm64: Handle stage-2 faults in parallel") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230421071606.1603916-2-oliver.upton@linux.dev
| | * | | Merge branch kvm-arm64/misc-6.4 into kvmarm-master/fixesMarc Zyngier2023-05-116-16/+43
| | |\ \ \ | | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/misc-6.4: : . : Minor changes for 6.4: : : - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...) : : - FP/SVE/SME documentation update, in the hope that this field : becomes clearer... : : - Add workaround for the usual Apple SEIS brokenness : : - Random comment fixes : . KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations KVM: arm64: Clarify host SME state management KVM: arm64: Restructure check for SVE support in FP trap handler KVM: arm64: Document check for TIF_FOREIGN_FPSTATE KVM: arm64: Fix repeated words in comments KVM: arm64: Use the bitmap API to allocate bitmaps KVM: arm64: Slightly optimize flush_context() Signed-off-by: Marc Zyngier <maz@kernel.org>
| | | * | KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS ↵Marc Zyngier2023-05-112-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implementations Unsurprisingly, the M2 PRO is also affected by the SEIS bug, so add it to the naughty list. And since M2 MAX is likely to be of the same ilk, flag it as well. Tested on a M2 PRO mini machine. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20230501182141.39770-1-maz@kernel.org
| | | * | KVM: arm64: Clarify host SME state managementMark Brown2023-04-211-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally when running a guest we do not touch the floating point register state until first use of floating point by the guest, saving the current state and loading the guest state at that point. This has been found to offer a performance benefit in common cases. However currently if SME is active when switching to a guest then we exit streaming mode, disable ZA and invalidate the floating point register state prior to starting the guest. The exit from streaming mode is required for correct guest operation, if we leave streaming mode enabled then many non-SME operations can generate SME traps (eg, SVE operations will become streaming SVE operations). If EL1 leaves CPACR_EL1.SMEN disabled then the host is unable to intercept these traps. This will mean that a SME unaware guest will see SME exceptions which will confuse it. Disabling streaming mode also avoids creating spurious indications of usage of the SME hardware which could impact system performance, especially with shared SME implementations. Document the requirement to exit streaming mode clearly. There is no issue with guest operation caused by PSTATE.ZA so we can defer handling for that until first floating point usage, do so if the register state is not that of the current task and hence has already been saved. We could also do this for the case where the register state is that for the current task however this is very unlikely to happen and would require disproportionate effort so continue to save the state in that case. Saving this state on first use would require that we map and unmap storage for the host version of these registers for use by the hypervisor, taking care to deal with protected KVM and the fact that the host can free or reallocate the backing storage. Given that the strong recommendation is that applications should only keep PSTATE.ZA enabled when the state it enables is in active use it is difficult to see a case where a VMM would wish to do this, it would need to not only be using SME but also running the guest in the middle of SME usage. This can be revisited in the future if a use case does arises, in the interim such tasks will work but experience a performance overhead. This brings our handling of SME more into line with our handling of other floating point state and documents more clearly the constraints we have, especially around streaming mode. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221214-kvm-arm64-sme-context-switch-v2-3-57ba0082e9ff@kernel.org
| | | * | KVM: arm64: Restructure check for SVE support in FP trap handlerMark Brown2023-04-211-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We share the same handler for general floating point and SVE traps with a check to make sure we don't handle any SVE traps if the system doesn't have SVE support. Since we will be adding SME support and wishing to handle that along with other FP related traps rewrite the check to be more scalable and a bit clearer too, ensuring we don't misidentify SME traps as SVE ones. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221214-kvm-arm64-sme-context-switch-v2-2-57ba0082e9ff@kernel.org