summaryrefslogtreecommitdiffstats
path: root/arch (follow)
Commit message (Collapse)AuthorAgeFilesLines
* kasan, x86: don't rename memintrinsics in uninstrumented filesMarco Elver2023-03-031-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that memcpy/memset/memmove are no longer overridden by KASAN, we can just use the normal symbol names in uninstrumented files. Drop the preprocessor redefinitions. Link: https://lkml.kernel.org/r/20230224085942.1791837-4-elver@google.com Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linux Kernel Functional Testing <lkft@linaro.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* Merge tag 'clk-for-linus' of ↵Linus Torvalds2023-02-261-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "We have one small patch to the clk core this time around. It fixes a corner case with the CLK_OPS_PARENT_ENABLE flag combined with clk_core_is_enabled() where it hangs the system. We'll simply assume the clk is disabled if the parent is disabled and the flag is set. Trying to turn on the parent to check the enable state of the clk runs into system hangs at boot. We let this bake in -next for a couple weeks to make sure there aren't any more issues because the last attempt to fix this ran into hangs and had to be reverted. Note: There were some more patches to the core framework around sync_state and disabling unused clks, but I asked for that to be reverted from the qcom PR because it isn't ready and we're still discussing the best solution on the list. Outside of the core clk framework, we have the usual collection of clk driver updates and support for new SoCs (which seems to never stop). The dirstat is dominated by Qualcomm because they added support for quite a few SoCs this time around and also migrated quite a few of their drivers to clk_parent_data. The other big diff is in the Mediatek clk drivers that saw a significant rework this cycle to similarly modernize the code, and we'll see that work continue in the next cycle as well. Nothing really jumps out as scary here, except that the significant churn in parent data descriptions can have typos that go unnoticed. More details below. Core: - Honor CLK_OPS_PARENT_ENABLE in clk_core_is_enabled() New Drivers: - Add a new clk-gpr-mux clock type and use it on i.MX6Q to add ENET ref clocks - Support for Mediatek MT7891 SoC clks - Support for many Qualcomm clk controllers: - QDU1000/QRU1000 global clock controller - SA8775P global clock controller - SM8550 TCSR and display clock controller - SM6350 clock controller - MSM8996 CBF and APCS clock controllers Updates: - Various cleanups and improvements to Mediatek clk drivers to reduce code size and modernize the drivers - Support for Versa 5P49V60 clks - Disable R-Car H3 ES1.*, as it was only available to an internal development group and needed a lot of quirks and workarounds - Add PWM, Compare-Match Timer (TIM), USB, SDHI, and eMMC clocks and resets on Renesas RZ/V2M - Add display clocks on Renesas R-Car V4H - Add Camera Receiving Unit (CRU) clocks and resets on Renesas RZ/G2L - Free the imx_uart_clocks even if imx_register_uart_clocks returns early - Get the stdout clocks count from device tree on i.MX - Drop the clock count argument from imx_register_uart_clocks() - Keep the uart clocks on i.MX93 for when earlycon is used - Fix SPDX comment in i.MX6SLL clocks bindings header - Drop some unnecessary spaces from i.MX8ULP clocks bindings header - Add imx_obtain_fixed_of_clock() for allowing to add a clock that is not configured via devicetree - Fix the ENET1 gate configuration for i.MX6UL according to the reference manual - Add ENET refclock mux support for i.MX6UL - Add support for USB host/device configuration on Renesas RZ/N1 - Add PLL2 programming support, and CAN-FD clocks on Renesas R-Car V4H - Add D1 CAN bus gates and resets for Allwinner - Mark D1 CPUX clock as critical on Allwinner - Reuse D1 driver for Allwinner R528/T113 - Cleanup sunxi-ng Kconfig - Fix sunxi-ng kernel-doc issues - Model Allwinner H3/H5 DRAM clock as fixed clock - Use .determine_rate() instead of .round_rate() for the dualdiv, mpll, sclk-div and cpu-dyn-div amlogic clock drivers - DDR clocks were marked as critical in the proper clock driver for each AT91 SoC such that drivers/memory/atmel-sdramc.c to be deleted in the next releases as it only does clock enablement - Patch to avoid compiling dt-compat.o for all AT91 SoCs as only some of them may use it - Support synchronous power_off requests in the qcom GDSC driver for proper GPU power collapse - Drop test clocks from various Qualcomm clk drivers - Update parent references to use clk_parent_data/clk_hw in various Qualcomm clk drivers - Fixes for the Qualcomm MSM8996 CPU clock controller - Transition Qualcomm MSM8974 GCC off the externally defined sleep_clk - Add GDSCs in the global clock controller for Qualcomm QCS404 - The SDCC core clocks on Qualcomm SM6115 are moved to floor_ops - Programming of clk_dis_wait for GPU CX GDSC on Qualcomm SC7180 and SDM845 are moved to use the recently introduced properties in the GDSC struct - Qualcomm's RPMh clock driver gains SM8550 and SA8775P clocks, and the IPA clock is added on a variety of platforms - De-duplicate identical clks in Qualcomm SMD RPM clk driver - Add a few missing clocks across msm8998, msm8992, msm8916, qcs404 to Qualcomm SDM RPM clk driver - Various Qualcomm clk drivers use devm_pm_runtime_enable() to simplify" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (228 commits) clk: qcom: apcs-msm8986: Include bitfield.h for FIELD_PREP clk: qcom: Revert sync_state based clk_disable_unused clk: imx: pll14xx: fix recalc_rate for negative kdiv clk: rs9: Drop unused pin_xin field MAINTAINERS: clk: imx: Add Peng Fan as reviewer clk: sprd: Add dependency for SPRD_UMS512_CLK clk: ralink: fix 'mt7621_gate_is_enabled()' function clk: mediatek: clk-mtk: Remove unneeded semicolon dt-bindings: clock: remove stih416 bindings dt-bindings: clock: add loongson-2 clock dt-bindings: clock: add loongson-2 clock include file clk: imx: fix compile testing imxrt1050 clk: Honor CLK_OPS_PARENT_ENABLE in clk_core_is_enabled() clk: imx: set imx_clk_gpr_mux_ops storage-class-specifier to static clk: renesas: rcar-gen3: Disable R-Car H3 ES1.* dt-bindings: clock: Merge qcom,gpucc-sm8350 into qcom,gpucc.yaml clk: qcom: gpucc-sdm845: fix clk_dis_wait being programmed for CX GDSC clk: qcom: gpucc-sc7180: fix clk_dis_wait being programmed for CX GDSC dt-bindings: clock: qcom,sa8775p-gcc: add the power-domains property clk: qcom: cpu-8996: add missing cputype include ...
| * arm64: dts: qcom: sm8250: Pad addresses to 8 hex digitsKonrad Dybcio2023-01-111-4/+4
| | | | | | | | | | | | | | | | Some addresses were 7-hex-digits long. Fix that. Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221230135044.287874-1-konrad.dybcio@linaro.org
* | Merge branch 'for-linus' of ↵Linus Torvalds2023-02-2525-153/+216
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha Pull alpha updates from Al Viro: "Mostly small janitorial fixes but there's also more important ones: a patch to fix loading large modules from Edward Humes, and some fixes from Al Viro" [ The fixes from Al mostly came in separately through Al's trees too and are now duplicated.. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha: alpha: in_irq() cleanup alpha: lazy FPU switching alpha/boot/misc: trim unused declarations alpha/boot/tools/objstrip: fix the check for ELF header alpha/boot: fix the breakage from -isystem series... alpha: fix FEN fault handling alpha: Avoid comma separated statements alpha: fixed a typo in core_cia.c alpha: remove unused __SLOW_DOWN_IO and SLOW_DOWN_IO definitions alpha: update config files alpha: fix R_ALPHA_LITERAL reloc for large modules alpha: Add some spaces to ensure format specification alpha: replace NR_SYSCALLS by NR_syscalls alpha: Remove redundant local asm header redirections alpha: Implement "current_stack_pointer" alpha: remove redundant err variable alpha: osf_sys: reduce kernel log spamming on invalid osf_mount call typenr
| * | alpha: in_irq() cleanupChangbin Du2023-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Replace the obsolete and ambiguos macro in_irq() with new macro in_hardirq(). Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: lazy FPU switchingAl Viro2023-02-259-119/+192
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On each context switch we save the FPU registers on stack of old process and restore FPU registers from the stack of new one. That allows us to avoid doing that each time we enter/leave the kernel mode; however, that can get suboptimal in some cases. For one thing, we don't need to bother saving anything for kernel threads. For another, if between entering and leaving the kernel a thread gives CPU up more than once, it will do useless work, saving the same values every time, only to discard the saved copy as soon as it returns from switch_to(). Alternative solution: * move the array we save into from switch_stack to thread_info * have a (thread-synchronous) flag set when we save them * have another flag set when they should be restored on return to userland. * do *NOT* save/restore them in do_switch_stack()/undo_switch_stack(). * restore on the exit to user mode if the restore flag had been set. Clear both flags. * on context switch, entry to fork/clone/vfork, before entry into do_signal() and on entry into straced syscall save the registers and set the 'saved' flag unless it had been already set. * on context switch set the 'restore' flag as well. * have copy_thread() set both flags for child, so the registers would be restored once the child returns to userland. * use the saved data in setup_sigcontext(); have restore_sigcontext() set both flags and copy from sigframe to save area. * teach ptrace to look for FPU registers in thread_info instead of switch_stack. * teach isolated accesses to FPU registers (rdfpcr, wrfpcr, etc.) to check the 'saved' flag (under preempt_disable()) and work with the save area if it's been set; if 'saved' flag is found upon write access, set 'restore' flag as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha/boot/misc: trim unused declarationsAl Viro2023-02-251-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | gzip_mark() and gzip_release() are gone; there used to be two forward declarations of each and the patch removing those suckers had left one of each behind... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha/boot/tools/objstrip: fix the check for ELF headerAl Viro2023-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just memcmp() with ELFMAG - that's the normal way to do it in userland code, which that thing is. Besides, that has the benefit of actually building - str_has_prefix() is *NOT* present in <string.h>. Fixes: 5f14596e55de "alpha: Replace strncmp with str_has_prefix" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha/boot: fix the breakage from -isystem series...Al Viro2023-02-254-5/+5
| | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: fix FEN fault handlingAl Viro2023-02-251-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Type 3 instruction fault (FPU insn with FPU disabled) is handled by quietly enabling FPU and returning. Which is fine, except that we need to do that both for fault in userland and in the kernel; the latter *can* legitimately happen - all it takes is this: .global _start _start: call_pal 0xae lda $0, 0 ldq $0, 0($0) - call_pal CLRFEN to clear "FPU enabled" flag and arrange for a signal delivery (SIGSEGV in this case). Fixed by moving the handling of type 3 into the common part of do_entIF(), before we check for kernel vs. user mode. Incidentally, check for kernel mode is unidiomatic; the normal way to do that is !user_mode(regs). The difference is that the open-coded variant treats any of bits 63..3 of regs->ps being set as "it's user mode" while the normal approach is to check just the bit 3. PS is a 4-bit register and regs->ps always will have bits 63..4 clear, so the open-code variant here is actually equivalent to !user_mode(regs). Harder to follow, though... Reproducer above will crash any box where CLRFEN is not ignored by PAL (== any actual hardware, AFAICS; PAL used in qemu doesn't bother implementing that crap). Cc: stable@vger.kernel.org # all way back... Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: Avoid comma separated statementsJoe Perches2023-02-251-3/+5
| | | | | | | | | | | | | | | | | | | | | Use semicolons and braces. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: fixed a typo in core_cia.crj12023-02-251-1/+1
| | | | | | | | | | | | Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: remove unused __SLOW_DOWN_IO and SLOW_DOWN_IO definitionsBjorn Helgaas2023-02-141-4/+0
| | | | | | | | | | | | | | | | | | | | | Remove unused __SLOW_DOWN_IO and SLOW_DOWN_IO definitions. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: update config filesLukas Bulwahn2023-02-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up config files by: - removing configs that were deleted in the past - removing configs not in tree and without recently pending patches - adding new configs that are replacements for old configs in the file For some detailed information, see Link. Link: https://lore.kernel.org/kernel-janitors/20220929090645.1389-1-lukas.bulwahn@gmail.com/ Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: fix R_ALPHA_LITERAL reloc for large modulesEdward Humes2023-02-141-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, R_ALPHA_LITERAL relocations would overflow for large kernel modules. This was because the Alpha's apply_relocate_add was relying on the kernel's module loader to have sorted the GOT towards the very end of the module as it was mapped into memory in order to correctly assign the global pointer. While this behavior would mostly work fine for small kernel modules, this approach would overflow on kernel modules with large GOT's since the global pointer would be very far away from the GOT, and thus, certain entries would be out of range. This patch fixes this by instead using the Tru64 behavior of assigning the global pointer to be 32KB away from the start of the GOT. The change made in this patch won't work for multi-GOT kernel modules as it makes the assumption the module only has one GOT located at the beginning of .got, although for the vast majority kernel modules, this should be fine. Of the kernel modules that would previously result in a relocation error, none of them, even modules like nouveau, have even come close to filling up a single GOT, and they've all worked fine under this patch. Signed-off-by: Edward Humes <aurxenon@lunos.org> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: Add some spaces to ensure format specificationZhang Jiaming2023-02-141-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | Add a space after ','. Add spaces around the '=', '>' and '=='. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: replace NR_SYSCALLS by NR_syscallsYang Yang2023-02-142-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reference to other arch likes x86_64 or arm64 to do this replacement. To solve compile error when using NR_syscalls in kernel[1]. [1] https://lore.kernel.org/all/202203270449.WBYQF9X3-lkp@intel.com/ Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: Remove redundant local asm header redirectionsMaciej W. Rozycki2023-02-145-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove a number of asm headers locally redirected to the respective generic or generated versions. For asm-offsets.h all that is needed is a Kbuild entry for the generic version, and for div64.h, irq_regs.h and kdebug.h nothing is needed as in their absence they will be redirected automatically according to include/asm-generic/Kbuild. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: Implement "current_stack_pointer"Kees Cook2023-02-143-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To follow the existing per-arch conventions replace open-coded use of asm "$30" as "current_stack_pointer". This will let it be used in non-arch places (like HARDENED_USERCOPY). Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: "Alexander A. Klimov" <grandmaster@al2klimov.de> Cc: linux-alpha@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: remove redundant err variableMinghao Chi2023-02-141-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return value from __hw_perf_event_init() directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | alpha: osf_sys: reduce kernel log spamming on invalid osf_mount call typenrColin Ian King2023-02-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling the osf_mount system call with an invalid typenr value will spam the kernel log with error messages. Reduce the spamming by making it a ratelimited printk. Issue found when exercising with the stress-ng enosys system call stressor. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Matt Turner <mattst88@gmail.com>
* | | Merge tag 'vfio-v6.3-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds2023-02-253-4/+6
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull VFIO updates from Alex Williamson: - Remove redundant resource check in vfio-platform (Angus Chen) - Use GFP_KERNEL_ACCOUNT for persistent userspace allocations, allowing removal of arbitrary kernel limits in favor of cgroup control (Yishai Hadas) - mdev tidy-ups, including removing the module-only build restriction for sample drivers, Kconfig changes to select mdev support, documentation movement to keep sample driver usage instructions with sample drivers rather than with API docs, remove references to out-of-tree drivers in docs (Christoph Hellwig) - Fix collateral breakages from mdev Kconfig changes (Arnd Bergmann) - Make mlx5 migration support match device support, improve source and target flows to improve pre-copy support and reduce downtime (Yishai Hadas) - Convert additional mdev sysfs case to use sysfs_emit() (Bo Liu) - Resolve copy-paste error in mdev mbochs sample driver Kconfig (Ye Xingchen) - Avoid propagating missing reset error in vfio-platform if reset requirement is relaxed by module option (Tomasz Duszynski) - Range size fixes in mlx5 variant driver for missed last byte and stricter range calculation (Yishai Hadas) - Fixes to suspended vaddr support and locked_vm accounting, excluding mdev configurations from the former due to potential to indefinitely block kernel threads, fix underflow and restore locked_vm on new mm (Steve Sistare) - Update outdated vfio documentation due to new IOMMUFD interfaces in recent kernels (Yi Liu) - Resolve deadlock between group_lock and kvm_lock, finally (Matthew Rosato) - Fix NULL pointer in group initialization error path with IOMMUFD (Yan Zhao) * tag 'vfio-v6.3-rc1' of https://github.com/awilliam/linux-vfio: (32 commits) vfio: Fix NULL pointer dereference caused by uninitialized group->iommufd docs: vfio: Update vfio.rst per latest interfaces vfio: Update the kdoc for vfio_device_ops vfio/mlx5: Fix range size calculation upon tracker creation vfio: no need to pass kvm pointer during device open vfio: fix deadlock between group lock and kvm lock vfio: revert "iommu driver notify callback" vfio/type1: revert "implement notify callback" vfio/type1: revert "block on invalid vaddr" vfio/type1: restore locked_vm vfio/type1: track locked_vm per dma vfio/type1: prevent underflow of locked_vm via exec() vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR vfio: platform: ignore missing reset if disabled at module init vfio/mlx5: Improve the target side flow to reduce downtime vfio/mlx5: Improve the source side flow upon pre_copy vfio/mlx5: Check whether VF is migratable samples: fix the prompt about SAMPLE_VFIO_MDEV_MBOCHS vfio/mdev: Use sysfs_emit() to instead of sprintf() vfio-mdev: add back CONFIG_VFIO dependency ...
| * | | vfio-mdev: add back CONFIG_VFIO dependencyArnd Bergmann2023-01-301-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_VFIO_MDEV cannot be selected when VFIO itself is disabled, otherwise we get a link failure: WARNING: unmet direct dependencies detected for VFIO_MDEV Depends on [n]: VFIO [=n] Selected by [y]: - SAMPLE_VFIO_MDEV_MTTY [=y] && SAMPLES [=y] - SAMPLE_VFIO_MDEV_MDPY [=y] && SAMPLES [=y] - SAMPLE_VFIO_MDEV_MBOCHS [=y] && SAMPLES [=y] /home/arnd/cross/arm64/gcc-13.0.1-nolibc/x86_64-linux/bin/x86_64-linux-ld: samples/vfio-mdev/mdpy.o: in function `mdpy_remove': mdpy.c:(.text+0x1e1): undefined reference to `vfio_unregister_group_dev' /home/arnd/cross/arm64/gcc-13.0.1-nolibc/x86_64-linux/bin/x86_64-linux-ld: samples/vfio-mdev/mdpy.o: in function `mdpy_probe': mdpy.c:(.text+0x149e): undefined reference to `_vfio_alloc_device' Fixes: 8bf8c5ee1f38 ("vfio-mdev: turn VFIO_MDEV into a selectable symbol") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230126211211.1762319-1-arnd@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | | vfio-mdev: turn VFIO_MDEV into a selectable symbolChristoph Hellwig2023-01-233-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VFIO_MDEV is just a library with helpers for the drivers. Stop making it a user choice and just select it by the drivers that use the helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/r/20230110091009.474427-3-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2023-02-25158-2291/+5357
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...
| * | | | KVM: SVM: hyper-v: placate modpost section mismatch errorRandy Dunlap2023-02-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | modpost reports section mismatch errors/warnings: WARNING: modpost: vmlinux.o: section mismatch in reference: svm_hv_hardware_setup (section: .text) -> (unknown) (section: .init.data) WARNING: modpost: vmlinux.o: section mismatch in reference: svm_hv_hardware_setup (section: .text) -> (unknown) (section: .init.data) WARNING: modpost: vmlinux.o: section mismatch in reference: svm_hv_hardware_setup (section: .text) -> (unknown) (section: .init.data) This "(unknown) (section: .init.data)" all refer to svm_x86_ops. Tag svm_hv_hardware_setup() with __init to fix a modpost warning as the non-stub implementation accesses __initdata (svm_x86_ops), i.e. would generate a use-after-free if svm_hv_hardware_setup() were actually invoked post-init. The helper is only called from svm_hardware_setup(), which is also __init, i.e. lack of __init is benign other than the modpost warning. Fixes: 1e0c7d40758b ("KVM: SVM: hyper-v: Remote TLB flush for SVM") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vineeth Pillai <viremana@linux.microsoft.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20230222073315.9081-1-rdunlap@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | Merge tag 'kvm-x86-apic-6.3' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini2023-02-223-49/+70
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM x86 APIC changes for 6.3: - Remove a superfluous variables from apic_get_tmcct() - Fix various edge cases in x2APIC MSR emulation - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Reset xAPIC when userspace forces "impossible" x2APIC => xAPIC transition
| | * | | | KVM: x86: Reinitialize xAPIC ID when userspace forces x2APIC => xAPICEmanuele Giuseppe Esposito2023-02-021-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reinitialize the xAPIC ID to the vCPU ID when userspace forces the APIC to transition directly from x2APIC to xAPIC mode, e.g. to emulate RESET. KVM already stuffs the xAPIC ID when the APIC is transitioned from DISABLED to xAPIC (commit 49bd29ba1dbd ("KVM: x86: reset APIC ID when enabling LAPIC")), i.e. userspace is conditioned to expect KVM to update the xAPIC ID, but KVM doesn't handle the architecturally-impossible case where userspace forces x2APIC=>xAPIC via KVM_SET_MSRS. On its own, the "bug" is benign, as userspace emulation of RESET will also stuff APIC registers via KVM_SET_LAPIC, i.e. will manually set the xAPIC ID. However, commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") introduced a bug, fixed by commit commit ef40757743b4 ("KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself"), that caused KVM to fail to properly update the xAPIC ID when handling KVM_SET_LAPIC. Refresh the xAPIC ID even though it's not strictly necessary so that KVM provides consistent behavior. Note, KVM follows Intel architecture with regard to handling the xAPIC ID and x2APIC IDs across mode transitions. For the APIC DISABLED case (commit 49bd29ba1dbd), Intel's SDM says the xAPIC ID _may_ be reinitialized 10.4.3 Enabling or Disabling the Local APIC When IA32_APIC_BASE[11] is set to 0, prior initialization to the APIC may be lost and the APIC may return to the state described in Section 10.4.7.1, “Local APIC State After Power-Up or Reset.” 10.4.7.1 Local APIC State After Power-Up or Reset ... The local APIC ID register is set to a unique APIC ID. ... i.e. KVM's behavior is legal as per Intel's architecture. In practice, Intel's behavior is N/A as modern Intel CPUs (since at least Haswell) make the xAPIC ID fully read-only. And for xAPIC => x2APIC transitions (commit 257b9a5faab5 ("KVM: x86: use correct APIC ID on x2APIC transition")), Intel's SDM says: Any APIC ID value written to the memory-mapped local APIC ID register is not preserved. AMD's APM says nothing (that I could find) about the xAPIC ID when the APIC is DISABLED, but testing on bare metal (Rome) shows that the xAPIC ID is preserved when the APIC is DISABLED and re-enabled in xAPIC mode. AMD also preserves the xAPIC ID when the APIC is transitioned from xAPIC to x2APIC, i.e. allows a backdoor write of the x2APIC ID, which is again not emulated by KVM. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Link: https://lore.kernel.org/all/20230109130605.2013555-2-eesposit@redhat.com [sean: rewrite changelog, set xAPIC ID iff APIC is enabled] Signed-off-by: Sean Christopherson <seanjc@google.com>
| | * | | | KVM: x86: fire timer when it is migrated and expired, and in oneshot modeLi RongQing2023-01-241-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when the vCPU was migrated, if its timer is expired, KVM _should_ fire the timer ASAP, zeroing the deadline here will cause the timer to immediately fire on the destination Cc: Sean Christopherson <seanjc@google.com> Cc: Peter Shier <pshier@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://lore.kernel.org/r/20230106040625.8404-1-lirongqing@baidu.com Signed-off-by: Sean Christopherson <seanjc@google.com>
| | * | | | KVM: VMX: Intercept reads to invalid and write-only x2APIC registersSean Christopherson2023-01-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intercept reads to invalid (non-existent) and write-only x2APIC registers when configuring VMX's MSR bitmaps for x2APIC+APICv. When APICv is fully enabled, Intel hardware doesn't validate the registers on RDMSR and instead blindly retrieves data from the vAPIC page, i.e. it's software's responsibility to intercept reads to non-existent and write-only MSRs. Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20230107011025.565472-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
| | * | | | KVM: VMX: Always intercept accesses to unsupported "extended" x2APIC regsSean Christopherson2023-01-241-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't clear the "read" bits for x2APIC registers above SELF_IPI (APIC regs 0x400 - 0xff0, MSRs 0x840 - 0x8ff). KVM doesn't emulate registers in that space (there are a smattering of AMD-only extensions) and so should intercept reads in order to inject #GP. When APICv is fully enabled, Intel hardware doesn't validate the registers on RDMSR and instead blindly retrieves data from the vAPIC page, i.e. it's software's responsibility to intercept reads to non-existent MSRs. Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support") Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20230107011025.565472-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
| | * | | | KVM: x86: Split out logic to generate "readable" APIC regs mask to helperSean Christopherson2023-01-242-13/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the generation of the readable APIC regs bitmask to a standalone helper so that VMX can use the mask for its MSR interception bitmaps. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20230107011025.565472-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
| | * | | | KVM: x86: Mark x2APIC DFR reg as non-existent for x2APICSean Christopherson2023-01-241-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark APIC_DFR as being invalid/non-existent in x2APIC mode instead of handling it as a one-off check in kvm_x2apic_msr_read(). This will allow reusing "valid_reg_mask" to generate VMX's interception bitmaps for x2APIC. Handling DFR in the common read path may also fix the Hyper-V PV MSR interface, if that can coexist with x2APIC. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20230107011025.565472-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
| | * | | | KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits 63:32Sean Christopherson2023-01-241-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reject attempts to set bits 63:32 for 32-bit x2APIC registers, i.e. all x2APIC registers except ICR. Per Intel's SDM: Non-zero writes (by WRMSR instruction) to reserved bits to these registers will raise a general protection fault exception Opportunistically fix a typo in a nearby comment. Reported-by: Marc Orr <marcorr@google.com> Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20230107011025.565472-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
| | * | | | KVM: x86: Inject #GP if WRMSR sets reserved bits in APIC Self-IPISean Christopherson2023-01-241-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inject a #GP if the guest attempts to set reserved bits in the x2APIC-only Self-IPI register. Bits 7:0 hold the vector, all other bits are reserved. Reported-by: Marc Orr <marcorr@google.com> Cc: Ben Gardon <bgardon@google.com> Cc: Venkatesh Srinivas <venkateshs@chromium.org> Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20230107011025.565472-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
| | * | | | KVM: x86: remove redundant ret variablezhang songyi2023-01-241-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return value from apic_get_tmcct() directly instead of taking this in another redundant variable. Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn> Link: https://lore.kernel.org/r/202211231704457807160@zte.com.cn Signed-off-by: Sean Christopherson <seanjc@google.com>
| * | | | | Merge tag 'kvmarm-6.3' of ↵Paolo Bonzini2023-02-2058-366/+1875
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.3 - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place. - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company). - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests. - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM. - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested. - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems. - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor. - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host. - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer This also drags in arm64's 'for-next/sme2' branch, because both it and the PSCI relay changes touch the EL2 initialization code.
| | * \ \ \ \ Merge branch kvm-arm64/nv-prefix into kvmarm/nextOliver Upton2023-02-1425-46/+1036
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * kvm-arm64/nv-prefix: : Preamble to NV support, courtesy of Marc Zyngier. : : This brings in a set of prerequisite patches for supporting nested : virtualization in KVM/arm64. Of course, there is a long way to go until : NV is actually enabled in KVM. : : - Introduce cpucap / vCPU feature flag to pivot the NV code on : : - Add support for EL2 vCPU register state : : - Basic nested exception handling : : - Hide unsupported features from the ID registers for NV-capable VMs KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Use reg_to_encoding() to get sysreg IDOliver Upton2023-02-111-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid open-coding and just use the helper to encode the ID from the sysreg table entry. No functional change intended. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230211190742.49843-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changesChristoffer Dall2023-02-111-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far we were flushing almost the entire universe whenever a VM would load/unload the SCTLR_EL1 and the two versions of that register had different MMU enabled settings. This turned out to be so slow that it prevented forward progress for a nested VM, because a scheduler timer tick interrupt would always be pending when we reached the nested VM. To avoid this problem, we consider the SCTLR_EL2 when evaluating if caches are on or off when entering virtual EL2 (because this is the value that we end up shadowing onto the hardware EL1 register). Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-19-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Filter out unsupported features from ID regsMarc Zyngier2023-02-114-1/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As there is a number of features that we either can't support, or don't want to support right away with NV, let's add some basic filtering so that we don't advertize silly things to the EL2 guest. Whilst we are at it, advertize FEAT_TTL as well as FEAT_GTG, which the NV implementation will implement. Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-18-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2Jintack Lim2023-02-111-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2 trap to EL2. Handle those traps just like we do for EL1 registers. One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12 will trap since it's one of the EL12 registers controlled by HCR_EL2.NV bit. Therefore, add a handler for it and don't treat it as a non-trap-registers when preparing a shadow context. These registers, being only a view on their EL1 counterpart, are permanently hidden from userspace. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> [maz: EL12_REG(), register visibility] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-17-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Allow a sysreg to be hidden from userspace onlyMarc Zyngier2023-02-112-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, we never needed to distinguish between registers hidden from userspace and being hidden from a guest (they are always either visible to both, or hidden from both). With NV, we have the ugly case of the EL02 and EL12 registers, which are only a view on the EL0 and EL1 registers. It makes absolutely no sense to expose them to userspace, since it already has the canonical view. Add a new visibility flag (REG_HIDDEN_USER) and a new helper that checks for it and REG_HIDDEN when checking whether to expose a sysreg to userspace. Subsequent patches will make use of it. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-16-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisorMarc Zyngier2023-02-113-1/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return to the guest and vice versa when taking an exception to the hypervisor, because we emulate virtual EL2 in EL1 and therefore have to translate the mode field from EL2 to EL1 and vice versa. This requires keeping track of the state we enter the guest, for which we transiently use a dedicated flag. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from ↵Jintack Lim2023-02-111-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtual EL2 For the same reason we trap virtual memory register accesses at virtual EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3 introduces the HCR_EL2.NV1 bit to be able to trap on those register accesses in EL1. Do not set this bit until the whole nesting support is completed, which happens further down the line... Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-14-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Handle SMCs taken from virtual EL2Jintack Lim2023-02-111-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Non-nested guests have used the hvc instruction to initiate SMCCC calls into KVM. This is quite a poor fit for NV as hvc exceptions are always taken to EL2. In other words, KVM needs to unconditionally forward the hvc exception back into vEL2 to uphold the architecture. Instead, treat the smc instruction from vEL2 as we would a guest hypercall, thereby allowing the vEL2 to interact with KVM's hypercall surface. Note that on NV-capable hardware HCR_EL2.TSC causes smc instructions executed in non-secure EL1 to trap to EL2, even if EL3 is not implemented. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-13-maz@kernel.org [Oliver: redo commit message, only handle smc from vEL2] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Handle trapped ERET from virtual EL2Christoffer Dall2023-02-113-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a guest hypervisor running virtual EL2 in EL1 executes an ERET instruction, we will have set HCR_EL2.NV which traps ERET to EL2, so that we can emulate the exception return in software. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Inject HVC exceptions to the virtual EL2Jintack Lim2023-02-111-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we expect all PSCI calls from the L1 hypervisor to be performed using SMC when nested virtualization is enabled, it is clear that all HVC instruction from the VM (including from the virtual EL2) are supposed to handled in the virtual EL2. Forward these to EL2 as required. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> [maz: add handling of HCR_EL2.HCD] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Support virtual EL2 exceptionsJintack Lim2023-02-118-20/+382
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support injecting exceptions and performing exception returns to and from virtual EL2. This must be done entirely in software except when taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE} == {1,1} (a VHE guest hypervisor). [maz: switch to common exception injection framework, illegal exeption return handling] Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
| | | * | | | | KVM: arm64: nv: Handle HCR_EL2.NV system register trapsJintack Lim2023-02-112-6/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When this bit is set, accessing EL2 registers in EL1 traps to EL2. In addition, executing the following instructions in EL1 will trap to EL2: tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the instructions that trap to EL2 with the NV bit were undef at EL1 prior to ARM v8.3. The only instruction that was not undef is eret. This patch sets up a handler for EL2 registers and SP_EL1 register accesses at EL1. The host hypervisor keeps those register values in memory, and will emulate their behavior. This patch doesn't set the NV bit yet. It will be set in a later patch once nested virtualization support is completed. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> [maz: EL2_REG() macros] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>