summaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/mmu.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
| * | KVM: MMU: use kvm_sync_page in kvm_sync_pagesPaolo Bonzini2016-03-081-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the last argument is true, kvm_unlink_unsync_page is called anyway in __kvm_sync_page (either by kvm_mmu_prepare_zap_page or by __kvm_sync_page itself). Therefore, kvm_sync_pages can just call kvm_sync_page, instead of going through kvm_unlink_unsync_page+__kvm_sync_page. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: move TLB flush out of __kvm_sync_pagePaolo Bonzini2016-03-081-29/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By doing this, kvm_sync_pages can use __kvm_sync_page instead of reinventing it. Because of kvm_mmu_flush_or_zap, the code does not end up being more complex than before, and more cleanups to kvm_sync_pages will come in the next patches. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: introduce kvm_mmu_flush_or_zapPaolo Bonzini2016-03-081-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is a generalization of mmu_pte_write_flush_tlb, that also takes care of calling kvm_mmu_commit_zap_page. The next patches will introduce more uses. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: check kvm_mmu_pages and mmu_page_path indicesXiao Guangrong2016-03-041-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Give a special invalid index to the root of the walk, so that we can check the consistency of kvm_mmu_pages and mmu_page_path. Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> [Extracted from a bigger patch proposed by Guangrong. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: Fix ubsan warningsPaolo Bonzini2016-03-041-24/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_mmu_pages_init is doing some really yucky stuff. It is setting up a sentinel for mmu_page_clear_parents; however, because of a) the way levels are numbered starting from 1 and b) the way mmu_page_path sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be out of bounds. This is harmless because the code overwrites up to the first two elements of parents->idx and these are initialized, and because the sentinel is not needed in this case---mmu_page_clear_parents exits anyway when it gets to the end of the array. However ubsan complains, and everyone else should too. This fix does three things. First it makes the mmu_page_path arrays PT64_ROOT_LEVEL elements in size, so that we can write to them without checking the level in advance. Second it disintegrates kvm_mmu_pages_init between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and for_each_sp (to place the NULL sentinel at the end of the current path). This is okay because the mmu_page_path is only used in mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within a for_each_sp iterator, and hence always after a call to mmu_pages_next. Third it changes mmu_pages_clear_parents to just use the sentinel to stop iteration, without checking the bounds on level. Reported-by: Sasha Levin <sasha.levin@oracle.com> Reported-by: Mike Krinkin <krinkin.m.u@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: cleanup handle_abnormal_pfnPaolo Bonzini2016-03-041-6/+2
| | | | | | | | | | | | | | | | | | | | | The goto and temporary variable are unnecessary, just use return statements. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: apply page track notifierXiao Guangrong2016-03-031-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Register the notifier to receive write track event so that we can update our shadow page table It makes kvm_mmu_pte_write() be the callback of the notifier, no function is changed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: simplify mmu_need_write_protectXiao Guangrong2016-03-031-22/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now, all non-leaf shadow page are page tracked, if gfn is not tracked there is no non-leaf shadow page of gfn is existed, we can directly make the shadow page of gfn to unsync Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: use page track for non-leaf shadow pagesXiao Guangrong2016-03-031-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | non-leaf shadow pages are always write protected, it can be the user of page track Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: clear write-flooding on the fast path of tracked pageXiao Guangrong2016-03-031-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the page fault is caused by write access on write tracked page, the real shadow page walking is skipped, we lost the chance to clear write flooding for the page structure current vcpu is using Fix it by locklessly waking shadow page table to clear write flooding on the shadow page structure out of mmu-lock. So that we change the count to atomic_t Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: let page fault handler be aware tracked pageXiao Guangrong2016-03-031-7/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The page fault caused by write access on the write tracked page can not be fixed, it always need to be emulated. page_fault_handle_page_track() is the fast path we introduce here to skip holding mmu-lock and shadow page table walking However, if the page table is not present, it is worth making the page table entry present and readonly to make the read access happy mmu_need_write_protect() need to be cooked to avoid page becoming writable when making page table present or sync/prefetch shadow page table entries Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: introduce kvm_mmu_slot_gfn_write_protectXiao Guangrong2016-03-031-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split rmap_write_protect() and introduce the function to abstract the write protection based on the slot This function will be used in the later patch Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpageXiao Guangrong2016-03-031-13/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Abstract the common operations from account_shadowed() and unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage() and kvm_mmu_gfn_allow_lpage() These two functions will be used by page tracking in the later patch Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowedXiao Guangrong2016-03-031-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_lpage_info->write_count is used to detect if the large page mapping for the gfn on the specified level is allowed, rename it to disallow_lpage to reflect its purpose, also we rename has_wrprotected_page() to mmu_gfn_lpage_is_disallowed() to make the code more clearer Later we will extend this mechanism for page tracking: if the gfn is tracked then large mapping for that gfn on any level is not allowed. The new name is more straightforward Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: use list_last_entryGeliang Tang2016-02-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | To make the intention clearer, use list_last_entry instead of list_entry. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault()Takuya Yoshikawa2016-02-231-23/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than placing a handle_mmio_page_fault() call in each vcpu->arch.mmu.page_fault() handler, moving it up to kvm_mmu_page_fault() makes the code better: - avoids code duplication - for kvm_arch_async_page_ready(), which is the other caller of vcpu->arch.mmu.page_fault(), removes an extra error_code check - avoids returning both RET_MMIO_PF_* values and raw integer values from vcpu->arch.mmu.page_fault() Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: MMU: Consolidate quickly_check_mmio_pf() and is_mmio_page_fault()Takuya Yoshikawa2016-02-231-11/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These two have only slight differences: - whether 'addr' is of type u64 or of type gva_t - whether they have 'direct' parameter or not Concerning the former, quickly_check_mmio_pf()'s u64 is better because 'addr' needs to be able to have both a guest physical address and a guest virtual address. The latter is just a stylistic issue as we can always calculate the mode from the 'vcpu' as is_mmio_page_fault() does. This patch keeps the parameter to make the following patch cleaner. In addition, the patch renames the function to mmio_info_in_cache() to make it clear what it actually checks for. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* / KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0Paolo Bonzini2016-03-101-1/+3
|/ | | | | | | | | | | | | | | | | | | KVM has special logic to handle pages with pte.u=1 and pte.w=0 when CR0.WP=1. These pages' SPTEs flip continuously between two states: U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed) and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed). When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1. When guest EFER has the NX bit cleared, the reserved bit check thinks that the latter state is invalid; teach it that the smep_andnot_wp case will also use the NX bit of SPTEs. Cc: stable@vger.kernel.org Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.inel.com> Fixes: c258b62b264fdc469b6d3610a907708068145e3b Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* kvm: rename pfn_t to kvm_pfn_tDan Williams2016-01-161-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To date, we have implemented two I/O usage models for persistent memory, PMEM (a persistent "ram disk") and DAX (mmap persistent memory into userspace). This series adds a third, DAX-GUP, that allows DAX mappings to be the target of direct-i/o. It allows userspace to coordinate DMA/RDMA from/to persistent memory. The implementation leverages the ZONE_DEVICE mm-zone that went into 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned and dynamically mapped by a device driver. The pmem driver, after mapping a persistent memory range into the system memmap via devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus page-backed pmem-pfns via flags in the new pfn_t type. The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the resulting pte(s) inserted into the process page tables with a new _PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys off _PAGE_DEVMAP to pin the device hosting the page range active. Finally, get_page() and put_page() are modified to take references against the device driver established page mapping. Finally, this need for "struct page" for persistent memory requires memory capacity to store the memmap array. Given the memmap array for a large pool of persistent may exhaust available DRAM introduce a mechanism to allocate the memmap from persistent memory. The new "struct vmem_altmap *" parameter to devm_memremap_pages() enables arch_add_memory() to use reserved pmem capacity rather than the page allocator. This patch (of 18): The core has developed a need for a "pfn_t" type [1]. Move the existing pfn_t in KVM to kvm_pfn_t [2]. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kvm: x86: fix comment about {mmu,nested_mmu}.gva_to_gpaDavid Matlack2016-01-071-4/+6
| | | | | | | | | The comment had the meaning of mmu.gva_to_gpa and nested_mmu.gva_to_gpa swapped. Fix that, and also add some details describing how each translation works. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table()Takuya Yoshikawa2015-12-181-9/+1
| | | | | | | | | Not just in order to clean up the code, but to make it faster by using enhanced instructions: the initialization became 20-30% faster on our testing machine. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()Takuya Yoshikawa2015-11-261-13/+7
| | | | | Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()Takuya Yoshikawa2015-11-261-21/+6
| | | | | | | | | | | As kvm_mmu_get_page() was changed so that every parent pointer would not get into the sp->parent_ptes chain before the entry pointed to by it was set properly, we can use the for_each_rmap_spte macro instead of pte_list_walk(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to ↵Takuya Yoshikawa2015-11-261-14/+9
| | | | | | | | | | | | | | | | | | | | link_shadow_page() Every time kvm_mmu_get_page() is called with a non-NULL parent_pte argument, link_shadow_page() follows that to set the parent entry so that the new mapping will point to the returned page table. Moving parent_pte handling there allows to clean up the code because parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and mmu_page_add_parent_pte(). In addition, the patch avoids calling mark_unsync() for other parents in the sp->parent_ptes chain than the newly added parent_pte, because they have been there since before the current page fault handling started. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()Takuya Yoshikawa2015-11-251-7/+7
| | | | | | | | | | Make kvm_mmu_alloc_page() do just what its name tells to do, and remove the extra allocation error check and zero-initialization of parent_ptes: shadow page headers allocated by kmem_cache_zalloc() are always in the per-VCPU pools. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Consolidate BUG_ON checks for reverse-mapped sptesTakuya Yoshikawa2015-11-251-9/+17
| | | | | | | | | | | | | | | | | | At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is placed right after the call to detect unrelated sptes which must not be found in the reverse-mapping list. Move this check in rmap_get_first/next() so that all call sites, not just the users of the for_each_rmap_spte() macro, will be checked the same way. One thing to keep in mind is that kvm_mmu_unlink_parents() also uses rmap_get_first() to handle parent sptes. The change will not break it because parent sptes are present, at least until drop_parent_pte() actually unlinks them, and not mmio-sptes. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()Takuya Yoshikawa2015-11-251-9/+4
| | | | | | | | | | | | | | | | | | | | | | is_rmap_spte(), originally named is_rmap_pte(), was introduced when the simple reverse mapping was implemented by commit cd4a4e5374110444 ("[PATCH] KVM: MMU: Implement simple reverse mapping"). At that point, its role was clear and only rmap_add() and rmap_remove() were using it to select sptes that need to be reverse-mapped. Independently of that, is_shadow_present_pte() was first introduced by commit c7addb902054195b ("KVM: Allow not-present guest page faults to bypass kvm") to do bypass_guest_pf optimization, which does not exist any more. These two seem to have changed their roles somewhat, and is_rmap_spte() just calls is_shadow_present_pte() now. Since using both of them without clear distinction just makes the code confusing, remove is_rmap_spte(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Make mmu_set_spte() return emulate valueTakuya Yoshikawa2015-11-251-13/+14
| | | | | | | | | | | | | | mmu_set_spte()'s code is based on the assumption that the emulate parameter has a valid pointer value if set_spte() returns true and write_fault is not zero. In other cases, emulate may be NULL, so a NULL-check is needed. Stop passing emulate pointer and make mmu_set_spte() return the emulate value instead to clean up this complex interface. Prefetch functions can just throw away the return value. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmapTakuya Yoshikawa2015-11-251-18/+18
| | | | | | | | | | | Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line code which clears a bit in the unsync child bitmap; the former places it inside a loop block and uses a few goto statements to jump to it. A new helper function, clear_unsync_child_bit(), makes the code cleaner. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Remove unused parameter of __direct_map()Takuya Yoshikawa2015-11-251-8/+4
| | | | | Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new structTakuya Yoshikawa2015-11-251-96/+100
| | | | | | | New struct kvm_rmap_head makes the code type-safe to some extent. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: always set accessed bit in shadow PTEsPaolo Bonzini2015-11-251-6/+3
| | | | | | | | | | | | | | | | | | | | Commit 7a1638ce4220 ("nEPT: Redefine EPT-specific link_shadow_page()", 2013-08-05) says: Since nEPT doesn't support A/D bit, we should not set those bit when building the shadow page table. but this is not necessary. Even though nEPT doesn't support A/D bits, and hence the vmcs12 EPT pointer will never enable them, we always use them for shadow page tables if available (see construct_eptp in vmx.c). So we can set the A/D bits freely in the shadow page table. This patch hence basically reverts commit 7a1638ce4220. Cc: Yang Zhang <yang.z.zhang@Intel.com> Cc: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_commonPaolo Bonzini2015-11-101-15/+5
| | | | | | | | | They are exactly the same, except that handle_mmio_page_fault has an unused argument and a call to WARN_ON. Remove the unused argument from the callers, and move the warning to (the former) handle_mmio_page_fault_common. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Initialize force_pt_level before calling mapping_level()Takuya Yoshikawa2015-10-191-3/+4
| | | | | | | | | | | | | | | | | | | Commit fd1369021878 ("KVM: x86: MMU: Move mapping_level_dirty_bitmap() call in mapping_level()") forgot to initialize force_pt_level to false in FNAME(page_fault)() before calling mapping_level() like nonpaging_map() does. This can sometimes result in forcing page table level mapping unnecessarily. Fix this and move the first *force_pt_level check in mapping_level() before kvm_vcpu_gfn_to_memslot() call to make it a bit clearer that the variable must be initialized before mapping_level() gets called. This change can also avoid calling kvm_vcpu_gfn_to_memslot() when !check_hugepage_cache_consistency() check in tdp_page_fault() forces page table level mapping. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Eliminate an extra memory slot search in mapping_level()Takuya Yoshikawa2015-10-161-6/+11
| | | | | | | | | Calling kvm_vcpu_gfn_to_memslot() twice in mapping_level() should be avoided since getting a slot by binary search may not be negligible, especially for virtual machines with many memory slots. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Remove mapping_level_dirty_bitmap()Takuya Yoshikawa2015-10-161-8/+16
| | | | | | | | | | Now that it has only one caller, and its name is not so helpful for readers, remove it. The new memslot_valid_for_gpte() function makes it possible to share the common code between gfn_to_memslot_dirty_bitmap() and mapping_level(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Move mapping_level_dirty_bitmap() call in mapping_level()Takuya Yoshikawa2015-10-161-15/+14
| | | | | | | This is necessary to eliminate an extra memory slot search later. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: MMU: Make force_pt_level boolTakuya Yoshikawa2015-10-161-4/+4
| | | | | | | This will be passed to a function later. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: manually unroll bad_mt_xwr loopPaolo Bonzini2015-10-161-8/+10
| | | | | | | The loop is computing one of two constants, it can be simpler to write everything inline. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: introduce lapic_in_kernelPaolo Bonzini2015-10-011-1/+1
| | | | | | | | Avoid pointer chasing and memory barriers, and simplify the code when split irqchip (LAPIC in kernel, IOAPIC/PIC in userspace) is introduced. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: fix off-by-one in reserved bits checkPaolo Bonzini2015-09-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 29ecd6601904 ("KVM: x86: avoid uninitialized variable warning", 2015-09-06) introduced a not-so-subtle problem, which probably escaped review because it was not part of the patch context. Before the patch, leaf was always equal to iterator.level. After, it is equal to iterator.level - 1 in the call to is_shadow_zero_bits_set, and when is_shadow_zero_bits_set does another "-1" the check on reserved bits becomes incorrect. Using "iterator.level" in the call fixes this call trace: WARNING: CPU: 2 PID: 17000 at arch/x86/kvm/mmu.c:3385 handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]() Modules linked in: tun sha256_ssse3 sha256_generic drbg binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd fam15h_power amd64_edac_mod k10temp edac_core amdkfd amd_iommu_v2 radeon acpi_cpufreq [...] Call Trace: dump_stack+0x4e/0x84 warn_slowpath_common+0x95/0xe0 warn_slowpath_null+0x1a/0x20 handle_mmio_page_fault.part.93+0x1a/0x20 [kvm] tdp_page_fault+0x231/0x290 [kvm] ? emulator_pio_in_out+0x6e/0xf0 [kvm] kvm_mmu_page_fault+0x36/0x240 [kvm] ? svm_set_cr0+0x95/0xc0 [kvm_amd] pf_interception+0xde/0x1d0 [kvm_amd] handle_exit+0x181/0xa70 [kvm_amd] ? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm] kvm_arch_vcpu_ioctl_run+0x6f6/0x1730 [kvm] ? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm] ? preempt_count_sub+0x9b/0xf0 ? mutex_lock_killable_nested+0x26f/0x490 ? preempt_count_sub+0x9b/0xf0 kvm_vcpu_ioctl+0x358/0x710 [kvm] ? __fget+0x5/0x210 ? __fget+0x101/0x210 do_vfs_ioctl+0x2f4/0x560 ? __fget_light+0x29/0x90 SyS_ioctl+0x4c/0x90 entry_SYSCALL_64_fastpath+0x16/0x73 ---[ end trace 37901c8686d84de6 ]--- Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: use correct page table format to check nested page table reserved bitsPaolo Bonzini2015-09-251-6/+17
| | | | | | | | | | | | Intel CPUID on AMD host or vice versa is a weird case, but it can happen. Handle it by checking the host CPU vendor instead of the guest's in reset_tdp_shadow_zero_bits_mask. For speed, the check uses the fact that Intel EPT has an X (executable) bit while AMD NPT has NX. Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: avoid uninitialized variable warningPaolo Bonzini2015-09-061-3/+4
| | | | | | | | | | | | | This does not show up on all compiler versions, so it sneaked into the first 4.3 pull request. The fix is to mimic the logic of the "print sptes" loop in the "fill array" loop. Then leaf and root can be both initialized unconditionally. Note that "leaf" now points to the first unused element of the array, not the last filled element. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: drop ept misconfig checkXiao Guangrong2015-08-051-22/+0
| | | | | | | | The logic used to check ept misconfig is completely contained in common reserved bits check for sptes, so it can be removed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: fully check zero bits for sptesXiao Guangrong2015-08-051-6/+37
| | | | | | | | | | | | | | The #PF with PFEC.RSV = 1 is designed to speed MMIO emulation, however, it is possible that the RSV #PF is caused by real BUG by mis-configure shadow page table entries This patch enables full check for the zero bits on shadow page table entries (which includes not only bits reserved by the hardware, but also bits that will never be set in the SPTE), then dump the shadow page table hierarchy. Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: introduce is_shadow_zero_bits_set()Xiao Guangrong2015-08-051-9/+19
| | | | | | | | | We have the same data struct to check reserved bits on guest page tables and shadow page tables, split is_rsvd_bits_set() so that the logic can be shared between these two paths Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: introduce the framework to check zero bits on sptesXiao Guangrong2015-08-051-0/+50
| | | | | | | | | | | | | We have abstracted the data struct and functions which are used to check reserved bit on guest page tables, now we extend the logic to check zero bits on shadow page tables The zero bits on sptes include not only reserved bits on hardware but also the bits that SPTEs willnever use. For example, shadow pages will never use GB pages unless the guest uses them too. Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: split reset_rsvds_bits_mask_eptXiao Guangrong2015-08-051-4/+10
| | | | | | | | | Since shadow ept page tables and Intel nested guest page tables have the same format, split reset_rsvds_bits_mask_ept so that the logic can be reused by later patches which check zero bits on sptes Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: split reset_rsvds_bits_maskXiao Guangrong2015-08-051-8/+18
| | | | | | | | | Since softmmu & AMD nested shadow page tables and guest page tables have the same format, split reset_rsvds_bits_mask so that the logic can be reused by later patches which check zero bits on sptes Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: MMU: introduce rsvd_bits_validateXiao Guangrong2015-08-051-33/+42
| | | | | | | | | | These two fields, rsvd_bits_mask and bad_mt_xwr, in "struct kvm_mmu" are used to check if reserved bits set on guest ptes, move them to a data struct so that the approach can be applied to check host shadow page table entries as well Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>