| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
If a nested guest does a NM fault but its CR0 doesn't contain the TS
flag (because it was already cleared by the guest with L1 aid) then we
have to activate FPU ourselves in L0 and then continue to L2. If TS flag
is set then we fallback on the previous behavior, forward the fault to
L1 if it asked for.
Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into kvm-next
A handful of fixes for KVM/arm64:
- A couple a basic fixes for running BE guests on a LE host
- A performance improvement for overcommitted VMs (same as the equivalent
patch for ARM)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Conflicts:
arch/arm/include/asm/kvm_emulate.h
arch/arm64/include/asm/kvm_emulate.h
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When booting a vcpu using PSCI, make sure we start it with the
endianness of the caller. Otherwise, secondaries can be pretty
unhappy to execute a BE kernel in LE mode...
This conforms to PSCI spec Rev B, 5.13.3.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Do the necessary byteswap when host and guest have different
views of the universe. Actually, the only case we need to take
care of is when the guest is BE. All the other cases are naturally
handled.
Also be careful about endianness when the data is being memcopy-ed
from/to the run buffer.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On an (even slightly) oversubscribed system, spinlocks are quickly
becoming a bottleneck, as some vcpus are spinning, waiting for a
lock to be released, while the vcpu holding the lock may not be
running at all.
The solution is to trap blocking WFEs and tell KVM that we're
now spinning. This ensures that other vpus will get a scheduling
boost, allowing the lock to be released more quickly. Also, using
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
when the VM is severely overcommited.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.linaro.org/people/cdall/linux-kvm-arm into kvm-next
Updates for KVM/ARM, take 3 supporting more than 4 CPUs.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Conflicts:
arch/arm/kvm/reset.c [cpu_reset->reset_regs change; context only]
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The KVM PSCI code blindly assumes that vcpu_id and MPIDR are
the same thing. This is true when vcpus are organized as a flat
topology, but is wrong when trying to emulate any other topology
(such as A15 clusters).
Change the KVM PSCI CPU_ON code to look at the MPIDR instead
of the vcpu_id to pick a target CPU.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now that the KVM/arm code knows about affinity, remove the hard
limit of 4 vcpus per VM.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The L2CTLR register contains the number of CPUs in this cluster.
Make sure the register content is actually relevant to the vcpu
that is being configured by computing the number of cores that are
part of its cluster.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In order to be able to support more than 4 A7 or A15 CPUs,
we need to fix the MPIDR computing to reflect the fact that
both A15 and A7 can only exist in clusters of at most 4 CPUs.
Fix the MPIDR computing to allow virtual clusters to be exposed
to the guest.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We need to copy padding to kernel space first before looking at it.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The prototype for kvm_check_iopl appeared in commit
f850e2e603bf5a05b0aee7901857cf85715aa694 ("KVM: x86 emulator: Check IOPL
level during io instruction emulation"), but the function never actually
existed. Remove the prototype.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
complete_pio ceased to exist in commit
7972995b0c346de76fe260ce0fd6bcc8ffab724a ("KVM: x86 emulator: Move
string pio emulation into emulator.c"), but the prototype remained.
Remove its prototype.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In certain occasions it is possible for a hung task detector
positive to be false: continuation from a paused VM, for example.
Add a method to reset detection, similar as is done
with other kernel watchdogs.
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Implement reset of kernel watchdogs at pvclock read time. This avoids
adding special code to every watchdog.
This is possible for watchdogs which measure time based on sched_clock() or
ktime_get() variants.
Suggested by Don Zickus.
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
I noticed that srcu_read_lock/unlock both have a memory barrier,
so just by moving srcu_read_unlock earlier we can get rid of
one call to smp_mb() using smp_mb__after_srcu_read_unlock instead.
Unsurprisingly, the gain is small but measureable using the unit test
microbenchmark:
before
vmcall in the ballpark of 1410 cycles
after
vmcall in the ballpark of 1360 cycles
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently cpuid emulation is traced only when executed by intercept.
Move trace point so that emulator invocation is traced too.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make code shorter.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
All decode_register() callers check if instruction has rex prefix
to properly decode one byte operand. It make sense to move the check
inside.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | | |
Conflicts:
arch/powerpc/include/asm/processor.h
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
drop is_hv_enabled, because that should not be a callback property
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This moves the kvmppc_ops callbacks to be a per VM entity. This
enables us to select HV and PR mode when creating a VM. We also
allow both kvm-hv and kvm-pr kernel module to be loaded. To
achieve this we move /dev/kvm ownership to kvm.ko module. Depending on
which KVM mode we select during VM creation we take a reference
count on respective module
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: fix coding style]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We will use that in the later patch to find the kvm ops handler
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: squash in compile fix]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch moves PR related tracepoints to a separate header. This
enables in converting PR to a kernel module which will be done in
later patches
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This help us to identify whether we are running with hypervisor mode KVM
enabled. The change is needed so that we can have both HV and PR kvm
enabled in the same kernel.
If both HV and PR KVM are included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest.
Allowing both PR and HV in the same kernel required some changes to
kvm_dev_ioctl_check_extension(), since the values returned now can't
be selected with #ifdefs as much as previously. We look at is_hv_enabled
to return the right value when checking for capabilities.For capabilities that
are only provided by HV KVM, we return the HV value only if
is_hv_enabled is true. For capabilities provided by PR KVM but not HV,
we return the PR value only if is_hv_enabled is false.
NOTE: in later patch we replace is_hv_enabled with a static inline
function comparing kvm_ppc_ops
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
With this patch if HV is included, interrupts come in to the HV version
of the kvmppc_interrupt code, which then jumps to the PR handler,
renamed to kvmppc_interrupt_pr, if the guest is a PR guest. This helps
in enabling both HV and PR, which we do in later patch
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch add a new callback kvmppc_ops. This will help us in enabling
both HV and PR KVM together in the same kernel. The actual change to
enable them together is done in the later patch in the series.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: squash in booke changes]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This help ups to select the relevant code in the kernel code
when we later move HV and PR bits as seperate modules. The patch
also makes the config options for PR KVM selectable
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
With later patches supporting PR kvm as a kernel module, the changes
that has to be built into the main kernel binary to enable PR KVM module
is now selected via KVM_BOOK3S_PR_POSSIBLE
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since the code in book3s_64_vio_hv.c is called from real mode with HV
KVM, and therefore has to be built into the main kernel binary, this
makes it always built-in rather than part of the KVM module. It gets
called from the KVM module by PR KVM, so this adds an EXPORT_SYMBOL_GPL().
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This label is not used now.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch adds the debug stub support on booke/bookehv.
Now QEMU debug stub can use hw breakpoint, watchpoint and
software breakpoint to debug guest.
This is how we save/restore debug register context when switching
between guest, userspace and kernel user-process:
When QEMU is running
-> thread->debug_reg == QEMU debug register context.
-> Kernel will handle switching the debug register on context switch.
-> no vcpu_load() called
QEMU makes ioctls (except RUN)
-> This will call vcpu_load()
-> should not change context.
-> Some ioctls can change vcpu debug register, context saved in vcpu->debug_regs
QEMU Makes RUN ioctl
-> Save thread->debug_reg on STACK
-> Store thread->debug_reg == vcpu->debug_reg
-> load thread->debug_reg
-> RUN VCPU ( So thread points to vcpu context )
Context switch happens When VCPU running
-> makes vcpu_load() should not load any context
-> kernel loads the vcpu context as thread->debug_regs points to vcpu context.
On heavyweight_exit
-> Load the context saved on stack in thread->debug_reg
Currently we do not support debug resource emulation to guest,
On debug exception, always exit to user space irrespective of
user space is expecting the debug exception or not. If this is
unexpected exception (breakpoint/watchpoint event not set by
userspace) then let us leave the action on user space. This
is similar to what it was before, only thing is that now we
have proper exit state available to user space.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
For KVM also use the "struct debug_reg" defined in asm/processor.h
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
"ehpriv 1" instruction is used for setting software breakpoints
by user space. This patch adds support to exit to user space
with "run->debug" have relevant information.
As this is the first point we are using run->debug, also defined
the run->debug structure.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
KVM need this function when switching from vcpu to user-space
thread. My subsequent patch will use this function.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This way we can use same data type struct with KVM and
also help in using other debug related function.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Mark the guest page as accessed so that there is likely
less chances of this page getting swap-out.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
"G" bit in MAS2 indicates whether the page is Guarded.
There is no reason to stop guest setting "G", so allow him.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
"E" bit in MAS2 bit indicates whether the page is accessed
in Little-Endian or Big-Endian byte order.
There is no reason to stop guest setting "E", so allow him."
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
For booke3e _PAGE_ENDIAN is not defined. Infact what is defined
is "_PAGE_LENDIAN" which is wrong and that should be _PAGE_ENDIAN.
There are no compilation errors as
arch/powerpc/include/asm/pte-common.h defines _PAGE_ENDIAN to 0
as it is not defined anywhere.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When an interrupt or exception happens in the guest that comes to the
host, the CPU goes to hypervisor real mode (MMU off) to handle the
exception but doesn't change the MMU context. After saving a few
registers, we then clear the "in guest" flag. If, for any reason,
we get an exception in the real-mode code, that then gets handled
by the normal kernel exception handlers, which turn the MMU on. This
is disastrous if the MMU is still set to the guest context, since we
end up executing instructions from random places in the guest kernel
with hypervisor privilege.
In order to catch this situation, we define a new value for the "in guest"
flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real
mode with guest MMU context. If the "in guest" flag is set to this value,
we branch off to an emergency handler. For the moment, this just does
a branch to self to stop the CPU from doing anything further.
While we're here, we define another new flag value to indicate that we
are in a HV guest, as distinct from a PR guest. This will be useful
when we have a kernel that can support both PR and HV guests concurrently.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
add kvmppc_free_vcores() to free the kvmppc_vcore structures
that we allocate for a guest, which are currently being leaked.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently, whenever any of the MMU notifier callbacks get called, we
invalidate all the shadow PTEs. This is inefficient because it means
that we typically then get a lot of DSIs and ISIs in the guest to fault
the shadow PTEs back in. We do this even if the address range being
notified doesn't correspond to guest memory.
This commit adds code to scan the memslot array to find out what range(s)
of guest physical addresses corresponds to the host virtual address range
being affected. For each such range we flush only the shadow PTEs
for the range, on all cpus.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The mark_page_dirty() function, despite what its name might suggest,
doesn't actually mark the page as dirty as far as the MM subsystem is
concerned. It merely sets a bit in KVM's map of dirty pages, if
userspace has requested dirty tracking for the relevant memslot.
To tell the MM subsystem that the page is dirty, we have to call
kvm_set_pfn_dirty() (or an equivalent such as SetPageDirty()).
This adds a call to kvm_set_pfn_dirty(), and while we are here, also
adds a call to kvm_set_pfn_accessed() to tell the MM subsystem that
the page has been accessed. Since we are now using the pfn in
several places, this adds a 'pfn' variable to store it and changes
the places that used hpaddr >> PAGE_SHIFT to use pfn instead, which
is the same thing.
This also changes a use of HPTE_R_PP to PP_RXRX. Both are 3, but
PP_RXRX is more informative as being the read-only page permission
bit setting.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When the MM code is invalidating a range of pages, it calls the KVM
kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
kvm_unmap_hva_range(), which arranges to flush all the existing host
HPTEs for guest pages. However, the Linux PTEs for the range being
flushed are still valid at that point. We are not supposed to establish
any new references to pages in the range until the ...range_end()
notifier gets called. The PPC-specific KVM code doesn't get any
explicit notification of that; instead, we are supposed to use
mmu_notifier_retry() to test whether we are or have been inside a
range flush notifier pair while we have been getting a page and
instantiating a host HPTE for the page.
This therefore adds a call to mmu_notifier_retry inside
kvmppc_mmu_map_page(). This call is inside a region locked with
kvm->mmu_lock, which is the same lock that is called by the KVM
MMU notifier functions, thus ensuring that no new notification can
proceed while we are in the locked region. Inside this region we
also create the host HPTE and link the corresponding hpte_cache
structure into the lists used to find it later. We cannot allocate
the hpte_cache structure inside this locked region because that can
lead to deadlock, so we allocate it outside the region and free it
if we end up not using it.
This also moves the updates of vcpu3s->hpte_cache_count inside the
regions locked with vcpu3s->mmu_lock, and does the increment in
kvmppc_mmu_hpte_cache_map() when the pte is added to the cache
rather than when it is allocated, in order that the hpte_cache_count
is accurate.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently we request write access to all pages that get mapped into the
guest, even if the guest is only loading from the page. This reduces
the effectiveness of KSM because it means that we unshare every page we
access. Also, we always set the changed (C) bit in the guest HPTE if
it allows writing, even for a guest load.
This fixes both these problems. We pass an 'iswrite' flag to the
mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether
the access is a load or a store. The mmu.xlate() functions now only
set C for stores. kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot()
instead of gfn_to_pfn() so that it can indicate whether we need write
access to the page, and get back a 'writable' flag to indicate whether
the page is writable or not. If that 'writable' flag is clear, we then
make the host HPTE read-only even if the guest HPTE allowed writing.
This means that we can get a protection fault when the guest writes to a
page that it has mapped read-write but which is read-only on the host
side (perhaps due to KSM having merged the page). Thus we now call
kvmppc_handle_pagefault() for protection faults as well as HPTE not found
faults. In kvmppc_handle_pagefault(), if the access was allowed by the
guest HPTE and we thus need to install a new host HPTE, we then need to
remove the old host HPTE if there is one. This is done with a new
function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to
find and remove the old host HPTE.
Since the memslot-related functions require the KVM SRCU read lock to
be held, this adds srcu_read_lock/unlock pairs around the calls to
kvmppc_handle_pagefault().
Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore
guest HPTEs that don't permit access, and to return -EPERM for accesses
that are not permitted by the page protections.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Both PR and HV KVM have separate, identical copies of the
kvmppc_skip_interrupt and kvmppc_skip_Hinterrupt handlers that are
used for the situation where an interrupt happens when loading the
instruction that caused an exit from the guest. To eliminate this
duplication and make it easier to compile in both PR and HV KVM,
this moves this code to arch/powerpc/kernel/exceptions-64s.S along
with other kernel interrupt handler code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|