linux - linux

	Commit message (Collapse)	Author	Files	Lines
2021-07-06	KVM: selftests: introduce P44V64 for z196 and EC12	Christian Borntraeger	3	-1/+23
	Older machines like z196 and zEC12 do only support 44 bits of physical addresses. Make this the default and check via IBC if we are on a later machine. We then add P47V64 as an additional model. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Link: https://lore.kernel.org/kvm/20210701153853.33063-1-borntraeger@de.ibm.com/ Fixes: 1bc603af73dd ("KVM: selftests: introduce P47V64 for s390x")
2021-06-25	KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled	Maxim Levitsky	3	-5/+5
	This better reflects the purpose of this variable on AMD, since on AMD the AVIC's memory slot can be enabled and disabled dynamically. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210623113002.111448-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	kvm: x86: disable the narrow guest module parameter on unload	Aaron Lewis	1	-0/+2
	When the kvm_intel module unloads the module parameter 'allow_smaller_maxphyaddr' is not cleared because the backing variable is defined in the kvm module. As a result, if the module parameter's state was set before kvm_intel unloads, it will also be set when it reloads. Explicitly clear the state in vmx_exit() to prevent this from happening. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20210623203426.1891402-1-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com>
2021-06-25	selftests: kvm: Allows userspace to handle emulation errors.	Aaron Lewis	5	-0/+317
	This test exercises the feature KVM_CAP_EXIT_ON_EMULATION_FAILURE. When enabled, errors in the in-kernel instruction emulator are forwarded to userspace with the instruction bytes stored in the exit struct for KVM_EXIT_INTERNAL_ERROR. So, when the guest attempts to emulate an 'flds' instruction, which isn't able to be emulated in KVM, instead of failing, KVM sends the instruction to userspace to handle. For this test to work properly the module parameter 'allow_smaller_maxphyaddr' has to be set. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210510144834.658457-3-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	kvm: x86: Allow userspace to handle emulation errors	Aaron Lewis	4	-4/+85
	Add a fallback mechanism to the in-kernel instruction emulator that allows userspace the opportunity to process an instruction the emulator was unable to. When the in-kernel instruction emulator fails to process an instruction it will either inject a #UD into the guest or exit to userspace with exit reason KVM_INTERNAL_ERROR. This is because it does not know how to proceed in an appropriate manner. This feature lets userspace get involved to see if it can figure out a better path forward. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210510144834.658457-2-aaronlewis@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on	Sean Christopherson	1	-3/+17
	Let the guest use 1g hugepages if TDP is enabled and the host supports GBPAGES, KVM can't actively prevent the guest from using 1g pages in this case since they can't be disabled in the hardware page walker. While injecting a page fault if a bogus 1g page is encountered during a software page walk is perfectly reasonable since KVM is simply honoring userspace's vCPU model, doing so arguably doesn't provide any meaningful value, and at worst will be horribly confusing as the guest will see inconsistent behavior and seemingly spurious page faults. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-55-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault	Sean Christopherson	1	-1/+1
	Use the current MMU instead of vCPU state to query CR4.SMEP when handling a page fault. In the nested NPT case, the current CR4.SMEP reflects L2, whereas the page fault is shadowing L1's NPT, which uses L1's hCR4. Practically speaking, this is a nop a NPT walks are always user faults, i.e. this code will never be reached, but fix it up for consistency. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-54-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault	Sean Christopherson	2	-8/+2
	Use the current MMU instead of vCPU state to query CR0.WP when handling a page fault. In the nested NPT case, the current CR0.WP reflects L2, whereas the page fault is shadowing L1's NPT. Practically speaking, this is a nop a NPT walks are always user faults, but fix it up for consistency. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-53-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT	Sean Christopherson	1	-6/+0
	Drop the extra reset of shadow_zero_bits in the nested NPT flow now that shadow_mmu_init_context computes the correct level for nested NPT. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-52-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic	Sean Christopherson	3	-35/+30
	Drop the pre-computed last_nonleaf_level, which is arguably wrong and at best confusing. Per the comment: Can have large pages at levels 2..last_nonleaf_level-1. the intent of the variable would appear to be to track what levels can _legally_ have large pages, but that intent doesn't align with reality. The computed value will be wrong for 5-level paging, or if 1gb pages are not supported. The flawed code is not a problem in practice, because except for 32-bit PSE paging, bit 7 is reserved if large pages aren't supported at the level. Take advantage of this invariant and simply omit the level magic math for 64-bit page tables (including PAE). For 32-bit paging (non-PAE), the adjustments are needed purely because bit 7 is ignored if PSE=0. Retain that logic as is, but make is_last_gpte() unique per PTTYPE so that the PSE check is avoided for PAE and EPT paging. In the spirit of avoiding branches, bump the "last nonleaf level" for 32-bit PSE paging by adding the PSE bit itself. Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are not PTEs and will blow up all the other gpte helpers. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-51-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86: Enhance comments for MMU roles and nested transition trickiness	Sean Christopherson	3	-10/+49
	Expand the comments for the MMU roles. The interactions with gfn_track PGD reuse in particular are hairy. Regarding PGD reuse, add comments in the nested virtualization flows to call out why kvm_init_mmu() is unconditionally called even when nested TDP is used. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-50-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE	Sean Christopherson	1	-1/+4
	Replace make_spte()'s WARN on a collision with the magic MMIO value with a generic WARN on reserved bits being set (including EPT's reserved WX combination). Warning on any reserved bits covers MMIO, A/D tracking bits with PAE paging, and in theory any future goofs that are introduced. Opportunistically convert to ONCE behavior to avoid spamming the kernel log, odds are very good that if KVM screws up one SPTE, it will botch all SPTEs for the same MMU. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-49-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU	Sean Christopherson	2	-21/+34
	Extract the reserved SPTE check and print helpers in get_mmio_spte() to new helpers so that KVM can also WARN on reserved badness when making a SPTE. Tag the checking helper with __always_inline to improve the probability of the compiler generating optimal code for the checking loop, e.g. gcc appears to avoid using %rbp when the helper is tagged with a vanilla "inline". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-48-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Use MMU's role to determine PTTYPE	Sean Christopherson	1	-4/+4
	Use the MMU's role instead of vCPU state or role_regs to determine the PTTYPE, i.e. which helpers to wire up. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-47-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers	Sean Christopherson	1	-17/+2
	Skip paging32E_init_context() and paging64_init_context_common() and go directly to paging64_init_context() (was the common version) now that the relevant flows don't need to distinguish between 64-bit PAE and 32-bit PAE for other reasons. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-46-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Add a helper to calculate root from role_regs	Sean Christopherson	1	-35/+25
	Add a helper to calculate the level for non-EPT page tables from the MMU's role_regs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-45-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Add helper to update paging metadata	Sean Christopherson	1	-18/+15
	Consolidate MMU guest metadata updates into a common helper for TDP, shadow, and nested MMUs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-44-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0	Sean Christopherson	1	-10/+10
	Don't bother updating the bitmasks and last-leaf information if paging is disabled as the metadata will never be used. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-43-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls	Sean Christopherson	1	-11/+10
	Move calls to reset_rsvds_bits_mask() out of the various mode statements and under a more generic CR0.PG=1 check. This will allow for additional code consolidation in the future. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-42-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper	Sean Christopherson	2	-11/+1
	Get LA57 from the role_regs, which are initialized from the vCPU even though TDP is enabled, instead of pulling the value directly from the vCPU when computing the guest's root_level for TDP MMUs. Note, the check is inside an is_long_mode() statement, so that requirement is not lost. Use role_regs even though the MMU's role is available and arguably "better". A future commit will consolidate the guest root level logic, and it needs access to EFER.LMA, which is not tracked in the role (it can't be toggled on VM-Exit, unlike LA57). Drop is_la57_mode() as there are no remaining users, and to discourage pulling MMU state from the vCPU (in the future). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-41-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Get nested MMU's root level from the MMU's role	Sean Christopherson	1	-5/+1
	Initialize the MMU's (guest) root_level using its mmu_role instead of redoing the calculations. The role_regs used to calculate the mmu_role are initialized from the vCPU, i.e. this should be a complete nop. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-40-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers	Sean Christopherson	2	-19/+0
	Drop kvm_mmu.nx as there no consumers left. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-39-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration	Sean Christopherson	1	-3/+4
	Get the MMU's effective EFER.NX from its role instead of using the one-off, dedicated flag. This will allow dropping said flag in a future commit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-38-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata	Sean Christopherson	1	-20/+16
	Use the MMU's role and role_regs to calculate the MMU's guest root level and NX bit. For some flows, the vCPU state may not be correct (or relevant), e.g. EPT doesn't interact with EFER.NX and nested NPT will configure the guest_mmu with possibly-stale vCPU state. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-37-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk	Sean Christopherson	1	-1/+1
	Use the NX bit from the MMU's role instead of the MMU itself so that the redundant, dedicated "nx" flag can be dropped. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-36-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Use MMU's roles to compute last non-leaf level	Sean Christopherson	1	-6/+6
	Use the MMU's role to get CR4.PSE when determining the last level at which the guest _cannot_ create a non-leaf PTE, i.e. cannot create a huge page. Note, the existing logic is arguably wrong when considering 5-level paging and the case where 1gb pages aren't supported. In practice, the logic is confusing but not broken, because except for 32-bit non-PAE paging, bit 7 (_PAGE_PSE) bit is reserved when a huge page isn't supported at that level. I.e. setting bit 7 will terminate the guest walk one way or another. Furthermore, last_nonleaf_level is only consulted after KVM has verified there are no reserved bits set. All that confusion will be addressed in a future patch by dropping last_nonleaf_level entirely. For now, massage the code to continue the march toward using mmu_role for (almost) all MMU computations. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-35-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Use MMU's role to compute PKRU bitmask	Sean Christopherson	1	-14/+7
	Use the MMU's role to calculate the Protection Keys (Restrict Userspace) bitmask instead of pulling bits from current vCPU state. For some flows, the vCPU state may not be correct (or relevant), e.g. EPT doesn't interact with PKRU. Case in point, the "ept" param simply disappears. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-34-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Use MMU's role to compute permission bitmask	Sean Christopherson	1	-9/+8
	Use the MMU's role to generate the permission bitmasks for the MMU. For some flows, the vCPU state may not be correct (or relevant), e.g. the nested NPT MMU can be initialized with incoherent vCPU state. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-33-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Drop vCPU param from reserved bits calculator	Sean Christopherson	1	-7/+4
	Drop the vCPU param from __reset_rsvds_bits_mask() as it's now unused, and ideally will remain unused in the future. Any information that's needed by the low level helper should be explicitly provided as it's used for both shadow/host MMUs and guest MMUs, i.e. vCPU state may be meaningless or simply wrong. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-32-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits	Sean Christopherson	1	-1/+1
	Use the MMU's role to get CR4.PSE when calculating reserved bits for the guest's PTEs. Practically speaking, this is a glorified nop as the role always come from vCPU state for the relevant flows, but converting to the roles will provide consistency once everything else is converted, and will Just Work if the "always comes from vCPU" behavior were ever to change (unlikely). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-31-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits	Sean Christopherson	1	-6/+9
	Unconditionally pass pse=false when calculating reserved bits for shadow PTEs. CR4.PSE is only relevant for 32-bit non-PAE paging, which KVM does not use for shadow paging (including nested NPT). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-30-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Always set new mmu_role immediately after checking old role	Sean Christopherson	1	-6/+9
	Refactor shadow MMU initialization to immediately set its new mmu_role after verifying it differs from the old role, and so that all flavors of MMU initialization share the same check-and-set pattern. Immediately setting the role will allow future commits to use mmu_role to configure the MMU without consuming stale state. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-29-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Set CR4.PKE/LA57 in MMU role iff long mode is active	Sean Christopherson	1	-2/+4
	Don't set cr4_pke or cr4_la57 in the MMU role if long mode isn't active, which is required for protection keys and 5-level paging to be fully enabled. Ignoring the bit avoids unnecessary reconfiguration on reuse, and also means consumers of mmu_role don't need to manually check for long mode. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-28-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Do not set paging-related bits in MMU role if CR0.PG=0	Sean Christopherson	1	-10/+14
	Don't set CR0/CR4/EFER bits in the MMU role if paging is disabled, paging modifiers are irrelevant if there is no paging in the first place. Somewhat arbitrarily clear gpte_is_8_bytes for shadow paging if paging is disabled in the guest. Again, there are no guest PTEs to process, so the size is meaningless. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-27-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Add accessors to query mmu_role bits	Sean Christopherson	2	-1/+22
	Add accessors via a builder macro for all mmu_role bits that track a CR0, CR4, or EFER bit, abstracting whether the bits are in the base or the extended role. Future commits will switch to using mmu_role instead of vCPU state to configure the MMU, i.e. there are about to be a large number of users. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-26-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans	Sean Christopherson	5	-8/+8
	Rename "nxe" to "efer_nx" so that future macro magic can use the pattern <reg>_<bit> for all CR0, CR4, and EFER bits that included in the role. Using "efer_nx" also makes it clear that the role bit reflects EFER.NX, not the NX bit in the corresponding PTE. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-25-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Use MMU's role_regs, not vCPU state, to compute mmu_role	Sean Christopherson	1	-40/+52
	Use the provided role_regs to calculate the mmu_role instead of pulling bits from current vCPU state. For some flows, e.g. nested TDP, the vCPU state may not be correct (or relevant). Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-24-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Ignore CR0 and CR4 bits in nested EPT MMU role	Sean Christopherson	1	-1/+3
	Do not incorporate CR0/CR4 bits into the role for the nested EPT MMU, as EPT behavior is not influenced by CR0/CR4. Note, this is the guest_mmu, (L1's EPT), not nested_mmu (L2's IA32 paging); the nested_mmu does need CR0/CR4, and is initialized in a separate flow. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-23-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Consolidate misc updates into shadow_mmu_init_context()	Sean Christopherson	1	-11/+6
	Consolidate the MMU metadata update calls to deduplicate code, and to prep for future cleanup. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-22-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs	Sean Christopherson	1	-13/+53
	Introduce "struct kvm_mmu_role_regs" to hold the register state that is incorporated into the mmu_role. For nested TDP, the register state that is factored into the MMU isn't vCPU state; the dedicated struct will be used to propagate the correct state throughout the flows without having to pass multiple params, and also provides helpers for the various flag accessors. Intentionally make the new helpers cumbersome/ugly by prepending four underscores. In the not-too-distant future, it will be preferable to use the mmu_role to query bits as the mmu_role can drop irrelevant bits without creating contradictions, e.g. clearing CR4 bits when CR0.PG=0. Reserve the clean helper names (no underscores) for the mmu_role. Add a helper for vCPU conversion, which is the common case. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-21-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Grab shadow root level from mmu_role for shadow MMUs	Sean Christopherson	1	-13/+5
	Use the mmu_role to initialize shadow root level instead of assuming the level of KVM's shadow root (host) is the same as that of the guest root, or in the case of 32-bit non-PAE paging where KVM forces PAE paging. For nested NPT, the shadow root level cannot be adapted to L1's NPT root level and is instead always the TDP root level because NPT uses the current host CR0/CR4/EFER, e.g. 64-bit KVM can't drop into 32-bit PAE to shadow L1's NPT. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper	Sean Christopherson	3	-7/+8
	Move nested NPT's invocation of reset_shadow_zero_bits_mask() into the MMU proper and unexport said function. Aside from dropping an export, this is a baby step toward eliminating the call entirely by fixing the shadow_root_level confusion. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86: Read and pass all CR0/CR4 role bits to shadow MMU helper	Sean Christopherson	3	-9/+10
	Grab all CR0/CR4 MMU role bits from current vCPU state when initializing a non-nested shadow MMU. Extract the masks from kvm_post_set_cr{0,4}(), as the CR0/CR4 update masks must exactly match the mmu_role bits, with one exception (see below). The "full" CR0/CR4 will be used by future commits to initialize the MMU and its role, as opposed to the current approach of pulling everything from vCPU, which is incorrect for certain flows, e.g. nested NPT. CR4.LA57 is an exception, as it can be toggled on VM-Exit (for L1's MMU) but can't be toggled via MOV CR4 while long mode is active. I.e. LA57 needs to be in the mmu_role, but technically doesn't need to be checked by kvm_post_set_cr4(). However, the extra check is completely benign as the hardware restrictions simply mean LA57 will never be _the_ cause of a MMU reset during MOV CR4. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Drop smep_andnot_wp check from "uses NX" for shadow MMUs	Sean Christopherson	1	-2/+1
	Drop the smep_andnot_wp role check from the "uses NX" calculation now that all non-nested shadow MMUs treat NX as used via the !TDP check. The shadow MMU for nested NPT, which shares the helper, does not need to deal with SMEP (or WP) as NPT walks are always "user" accesses and WP is explicitly noted as being ignored: Table walks for guest page tables are always treated as user writes at the nested page table level. A table walk for the guest page itself is always treated as a user access at the nested page table level The host hCR0.WP bit is ignored under nested paging. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state	Sean Christopherson	1	-0/+6
	Add a comment in the nested NPT initialization flow to call out that it intentionally uses vmcb01 instead current vCPU state to get the effective hCR4 and hEFER for L1's NPT context. Note, despite nSVM's efforts to handle the case where vCPU state doesn't reflect L1 state, the MMU may still do the wrong thing due to pulling state from the vCPU instead of the passed in CR0/CR4/EFER values. This will be addressed in future commits. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER	Sean Christopherson	4	-9/+10
	When configuring KVM's MMU, pass CR0 and CR4 as unsigned longs, and EFER as a u64 in various flows (mostly MMU). Passing the params as u32s is functionally ok since all of the affected registers reserve bits 63:32 to zero (enforced by KVM), but it's technically wrong. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Rename unsync helper and update related comments	Sean Christopherson	3	-13/+34
	Rename mmu_need_write_protect() to mmu_try_to_unsync_pages() and update a variety of related, stale comments. Add several new comments to call out subtle details, e.g. that upper-level shadow pages are write-tracked, and that can_unsync is false iff KVM is in the process of synchronizing pages. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: Drop the intermediate "transient" __kvm_sync_page()	Sean Christopherson	1	-12/+5
	Nove the kvm_unlink_unsync_page() call out of kvm_sync_page() and into it's sole caller, and fold __kvm_sync_page() into kvm_sync_page() since the latter becomes a pure pass-through. There really should be no reason for code to do a complete sync of a shadow page outside of the full kvm_mmu_sync_roots(), e.g. the one use case that creeped in turned out to be flawed and counter-productive. Drop the stale comment about @sp->gfn needing to be write-protected, as it directly contradicts the kvm_mmu_get_page() usage. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: comment on kvm_mmu_get_page's syncing of pages	Sean Christopherson	1	-2/+11
	Explain the usage of sync_page() in kvm_mmu_get_page(), which is subtle in how and why it differs from mmu_sync_children(). Signed-off-by: Sean Christopherson <seanjc@google.com> [Split out of a different patch by Sean. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-25	KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches	Sean Christopherson	2	-6/+26
	When synchronizing a shadow page, WARN and zap the page if its mmu role isn't compatible with the current MMU context, where "compatible" is an exact match sans the bits that have no meaning in the overall MMU context or will be explicitly overwritten during the sync. Many of the helpers used by sync_page() are specific to the current context, updating a SMM vs. non-SMM shadow page would use the wrong memslots, updating L1 vs. L2 PTEs might work but would be extremely bizaree, and so on and so forth. Drop the guard with respect to 8-byte vs. 4-byte PTEs in __kvm_sync_page(), it was made useless when kvm_mmu_get_page() stopped trying to sync shadow pages irrespective of the current MMU context. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>