| Commit message (Collapse) | Author | Files | Lines |
|
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nfs4_cntl_odstate.co_odcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)
Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.
The variable nfs4_stid.sc_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
lockd_up() can call lockd_unregister_notifiers twice:
inside lockd_start_svc() when it calls lockd_svc_exit_thread()
and then in error path of lockd_up()
Patch forces lockd_start_svc() to unregister notifiers in all error cases
and removes extra unregister in error path of lockd_up().
Fixes: cb7d224f82e4 "lockd: unregister notifier blocks if the service ..."
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that
the client is making a call that matches a previous (slot, seqid) pair
but that *isn't* actually a replay, because some detail of the call
doesn't actually match the previous one.
Catching every such case is difficult, but we may as well catch a few
easy ones. This also handles the case described in the previous patch,
in a different way.
The spec does however require us to catch the case where the difference
is in the rpc credentials. This prevents somebody from snooping another
user's replies by fabricating retries.
(But the practical value of the attack is limited by the fact that the
replies with the most sensitive data are READ replies, which are not
normally cached.)
Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Currently our handling of 4.1+ requests without "cachethis" set is
confusing and not quite correct.
Suppose a client sends a compound consisting of only a single SEQUENCE
op, and it matches the seqid in a session slot (so it's a retry), but
the previous request with that seqid did not have "cachethis" set.
The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
but the protocol only allows that to be returned on the op following the
SEQUENCE, and there is no such op in this case.
The protocol permits us to cache replies even if the client didn't ask
us to. And it's easy to do so in the case of solo SEQUENCE compounds.
So, when we get a solo SEQUENCE, we can either return the previously
cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
way from the original call.
Currently, we're returning a corrupt reply in the case a solo SEQUENCE
matches a previous compound with more ops. This actually matters
because the Linux client recently started doing this as a way to recover
from lost replies to idempotent operations in the case the process doing
the original reply was killed: in that case it's difficult to keep the
original arguments around to do a real retry, and the client no longer
cares what the result is anyway, but it would like to make sure that the
slot's sequence id has been incremented, and the solo SEQUENCE assures
that: if the server never got the original reply, it will increment the
sequence id. If it did get the original reply, it won't increment, and
nothing else that about the reply really matters much. But we can at
least attempt to return valid xdr!
Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The function _svc_create_xprt is local to the source and
does not need to be in global scope, so make it static.
Cleans up sparse warning:
symbol '_svc_create_xprt' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
There is no guarantee that either the request or the svc_xprt exist
by the time we get round to printing the trace message.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Using the ARRAY_SIZE macro improves the readability of the code.
Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
(sizeof(E)@p /sizeof(*E))
|
(sizeof(E)@p /sizeof(E[...]))
|
(sizeof(E)@p /sizeof(T))
)
Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Instead of granting client's full requests until we hit our DRC size
limit and then failing CREATE_SESSIONs (and hence mounts) completely,
start granting clients smaller slot tables as we approach the limit.
The factor chosen here is pretty much arbitrary.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
An NFSv4.1+ client negotiates the size of its duplicate reply cache size
in the initial CREATE_SESSION request. The server preallocates the
memory for the duplicate reply cache to ensure that we'll never fail to
record the response to a nonidempotent operation.
To prevent a few CREATE_SESSIONs from consuming all of memory we set an
upper limit based on nr_free_buffer_pages(). 1/2^10 has been too
limiting in practice; 1/2^7 is still less than one percent.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
These checks should have already be done centrally in
nfsd4_proc_compound, the checks in each individual operation are
unnecessary.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Since __state_in_grace return only true/false, make it return bool
instead of int.
Same change for the two user of it, locks_in_grace/opens_in_grace
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Commit c87fb4a378f9 ("lockd: NLM grace period shouldn't block NFSv4 opens")
made the locks_in_grace() comment be in the wrong place.
This patch move this comment just at the right place.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
This fix the following warning
fs/nfs_common/grace.c:66:1: warning: no previous prototype for function '__state_in_grace' [-Wmissing-prototypes]
by adding the missing static.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
include/linux/fs.h and include/uapi/linux/fs.h deal with much more than
just file locking. Move them to the "FILESYSTEMS (VFS and
infrastructure)" section of the MAINTAINERS file so that the first
suggestion from get_maintainer.pl isn't the file locking maintainers,
which has caused some confusion.
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
Commit 5620a0d1aac ("firmware: delete in-kernel firmware") removed the
entire firmware directory. Unfortunately it thereby also removed the
support for built-in firmware.
This restores the ability to build firmware directly into the kernel by
pruning the original Makefile to the necessary minimum. The default for
EXTRA_FIRMWARE_DIR is now the standard directory /lib/firmware/.
Fixes: 5620a0d1aac ("firmware: delete in-kernel firmware")
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: Greg K-H <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The driver doesn't support events from address families other than IPv4
and IPv6, so ignore them. Otherwise, we risk queueing a work item before
it's initialized.
This can happen in case a VRF is configured when MROUTE_MULTIPLE_TABLES
is enabled, as the VRF driver will try to add an l3mdev rule for the
IPMR family.
Fixes: 65e65ec137f4 ("mlxsw: spectrum_router: Don't ignore IPv6 notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Andreas Rammhold <andreas@rammhold.de>
Reported-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix link in filter.txt.
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now skb->mstamp_skb is updated later, we also need to call
tcp_rate_skb_sent() after the update is done.
Fixes: 8c72c65b426b ("tcp: update skb->skb_mstamp more carefully")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Neither ___bpf_prog_run nor the JITs accept it.
Also adds a new test case.
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sctp_diag would not actually dump out sk/asoc if inet_sctp_diag_fill
returns err, in which case it shouldn't mark sk dumped by setting
cb->args[3] as 1 in sctp_sock_dump().
Otherwise, it could cause some asocs to have no parent's sk dumped
in 'ss --sctp'.
So this patch is to not set cb->args[3] when inet_sctp_diag_fill()
returns err in sctp_sock_dump().
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the
dump") tried to fix an use-after-free issue by checking !sctp_sk(sk)->ep
with holding sock and sock lock.
But Paolo noticed that endpoint could be destroyed in sctp_rcv without
sock lock protection. It means the use-after-free issue still could be
triggered when sctp_rcv put and destroy ep after sctp_sock_dump checks
!ep, although it's pretty hard to reproduce.
I could reproduce it by mdelay in sctp_rcv while msleep in sctp_close
and sctp_sock_dump long time.
This patch is to add another param cb_done to sctp_for_each_transport
and dump ep->assocs with holding tsp after jumping out of transport's
traversal in it to avoid this issue.
It can also improve sctp diag dump to make it run faster, as no need
to save sk into cb->args[5] and keep calling sctp_for_each_transport
any more.
This patch is also to use int * instead of int for the pos argument
in sctp_for_each_transport, which could make postion increment only
in sctp_for_each_transport and no need to keep changing cb->args[2]
in sctp_sock_filter and sctp_sock_dump any more.
Fixes: 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The default receive buffer size was reduced by recent change
to a value which was appropriate for 10G and Windows Server 2016.
But the value is too small for full performance with 40G on Azure.
Increase the default back to maximum supported by host.
Fixes: 8b5327975ae1 ("netvsc: allow controlling send/recv buffer size")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
liujian reported a problem in TCP_USER_TIMEOUT processing with a patch
in tcp_probe_timer() :
https://www.spinics.net/lists/netdev/msg454496.html
After investigations, the root cause of the problem is that we update
skb->skb_mstamp of skbs in write queue, even if the attempt to send a
clone or copy of it failed. One reason being a routing problem.
This patch prevents this, solving liujian issue.
It also removes a potential RTT miscalculation, since
__tcp_retransmit_skb() is not OR-ing TCP_SKB_CB(skb)->sacked with
TCPCB_EVER_RETRANS if a failure happens, but skb->skb_mstamp has
been changed.
A future ACK would then lead to a very small RTT sample and min_rtt
would then be lowered to this too small value.
Tested:
# cat user_timeout.pkt
--local_ip=192.168.102.64
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 `ifconfig tun0 192.168.102.64/16; ip ro add 192.0.2.1 dev tun0`
+0 < S 0:0(0) win 0 <mss 1460>
+0 > S. 0:0(0) ack 1 <mss 1460>
+.1 < . 1:1(0) ack 1 win 65530
+0 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
+0 write(4, ..., 24) = 24
+0 > P. 1:25(24) ack 1 win 29200
+.1 < . 1:1(0) ack 25 win 65530
//change the ipaddress
+1 `ifconfig tun0 192.168.0.10/16`
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+0 `ifconfig tun0 192.168.102.64/16`
+0 < . 1:2(1) ack 25 win 65530
+0 `ifconfig tun0 192.168.0.10/16`
+3 write(4, ..., 24) = -1
# ./packetdrill user_timeout.pkt
Signed-off-by: Eric Dumazet <edumazet@googl.com>
Reported-by: liujian <liujian56@huawei.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rt_iif is only set to the actual egress device for the output path. The
recent change to consider the l3slave flag when returning IP_PKTINFO
works for local traffic (the correct device index is returned), but it
broke the more typical use case of packets received from a remote host
always returning the VRF index rather than the original ingress device.
Update the fixup to consider l3slave and rt_iif actually getting set.
Fixes: 1dfa76390bf05 ("net: ipv4: add check for l3slave for index returned in IP_PKTINFO")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the network interface is kept running during suspend, the net core
may call net_device_ops.ndo_start_xmit() while the Ethernet device is
still suspended, which may lead to a system crash.
E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
driven by a PM controlled clock. If the Ethernet registers are accessed
while the clock is not running, the system will crash with an imprecise
external abort.
As this is a race condition with a small time window, it is not so easy
to trigger at will. Using pm_test may increase your chances:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo platform > /sys/power/pm_test
# echo mem > /sys/power/state
To fix this, make sure the network interface is quietened during
suspend.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can enter a deadlock situation because there is no sufficient protection
when ndo_get_stats64() runs in process context to guard against RX or TX NAPI
contexts running in softirq, this can lead to the following lockdep splat and
actual deadlock was experienced as well with an iperf session in the background
and a while loop doing ifconfig + ethtool.
[ 5.780350] ================================
[ 5.784679] WARNING: inconsistent lock state
[ 5.789011] 4.13.0-rc7-02179-g32fae27c725d #70 Not tainted
[ 5.794561] --------------------------------
[ 5.798890] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 5.804971] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 5.810175] (&syncp->seq#2){+.?...}, at: [<c0768a28>] bcm_sysport_tx_reclaim+0x30/0x54
[ 5.818327] {SOFTIRQ-ON-W} state was registered at:
[ 5.823278] bcm_sysport_get_stats64+0x17c/0x258
[ 5.828053] dev_get_stats+0x38/0xac
[ 5.831776] rtnl_fill_stats+0x30/0x118
[ 5.835761] rtnl_fill_ifinfo+0x538/0xe24
[ 5.839921] rtmsg_ifinfo_build_skb+0x6c/0xd8
[ 5.844430] rtmsg_ifinfo_event.part.5+0x14/0x44
[ 5.849201] rtmsg_ifinfo+0x20/0x28
[ 5.852837] register_netdevice+0x628/0x6b8
[ 5.857171] register_netdev+0x14/0x24
[ 5.861051] bcm_sysport_probe+0x30c/0x438
[ 5.865280] platform_drv_probe+0x50/0xb0
[ 5.869418] driver_probe_device+0x2e8/0x450
[ 5.873817] __driver_attach+0x104/0x120
[ 5.877871] bus_for_each_dev+0x7c/0xc0
[ 5.881834] bus_add_driver+0x1b0/0x270
[ 5.885797] driver_register+0x78/0xf4
[ 5.889675] do_one_initcall+0x54/0x190
[ 5.893646] kernel_init_freeable+0x144/0x1d0
[ 5.898135] kernel_init+0x8/0x110
[ 5.901665] ret_from_fork+0x14/0x2c
[ 5.905363] irq event stamp: 24263
[ 5.908804] hardirqs last enabled at (24262): [<c08eecf0>] net_rx_action+0xc4/0x4e4
[ 5.916624] hardirqs last disabled at (24263): [<c0a7da00>] _raw_spin_lock_irqsave+0x1c/0x98
[ 5.925143] softirqs last enabled at (24258): [<c022a7fc>] irq_enter+0x84/0x98
[ 5.932524] softirqs last disabled at (24259): [<c022a918>] irq_exit+0x108/0x16c
[ 5.939985]
[ 5.939985] other info that might help us debug this:
[ 5.946576] Possible unsafe locking scenario:
[ 5.946576]
[ 5.952556] CPU0
[ 5.955031] ----
[ 5.957506] lock(&syncp->seq#2);
[ 5.960955] <Interrupt>
[ 5.963604] lock(&syncp->seq#2);
[ 5.967227]
[ 5.967227] *** DEADLOCK ***
[ 5.967227]
[ 5.973222] 1 lock held by swapper/0/0:
[ 5.977092] #0: (&(&ring->lock)->rlock){..-...}, at: [<c0768a18>] bcm_sysport_tx_reclaim+0x20/0x54
So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64()
since it does not appear to be useful for anything. No inconsistency was
observed with either ifconfig or ethtool, global TX counts equal the sum of
per-queue TX counts on a 32-bit architecture.
Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When building an allmodconfig kernel with gcc-4.6, we get a rather
odd warning:
drivers/net/vrf.c: In function ‘vrf_ip6_input_dst’:
drivers/net/vrf.c:964:3: error: initialized field with side-effects overwritten [-Werror]
drivers/net/vrf.c:964:3: error: (near initialization for ‘fl6’) [-Werror]
I have no idea what this warning is even trying to say, but it does
seem like a false positive. Reordering the initialization in to match
the structure definition gets rid of the warning, and might also avoid
whatever gcc thinks is wrong here.
Fixes: 9ff74384600a ("net: vrf: Handle ipv6 multicast and link-local addresses")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
call to memset to assign 0 value immediately after allocating
memory with kzalloc is unnecesaary as kzalloc allocates the memory
filled with 0 value.
Semantic patch used to resolve this issue:
@@
expression e,e2; constant c;
statement S;
@@
e = kzalloc(e2, c);
if(e == NULL) S
- memset(e, 0, e2);
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Acked-by: Sudarsana Kalluru <sudarsana.kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similar to other Gigabyte laptops, the touchpad on P57 requires a
keyboard reset to detect Elantech touchpad correctly.
BugLink: https://bugs.launchpad.net/bugs/1594214
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
When emulating a nested VM-entry from L1 to L2, several control field
validation checks are deferred to the hardware. Should one of these
validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When
this happens, the L2 guest state is not loaded (even in part), and
execution should continue in L1 with the next instruction after the
VMLAUNCH/VMRESUME.
The VMCS12 is not modified (except for the VM-instruction error
field), the VMCS12 MSR save/load lists are not processed, and the CPU
state is not loaded from the VMCS12 host area. Moreover, the vmcs02
exit reason is stale, so it should not be consulted for any reason.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the
VM-instruction error field of the current VMCS), the launch state of
the current VMCS is not set to "launched," and the VM-exit information
fields of the current VMCS (including IDT-vectoring information and
exit reason) are stale.
On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit
of the exit reason field), the launch state of the current VMCS is not
set to "launched," and only two of the VM-exit information fields of
the current VMCS are modified (exit reason and exit
qualification). The remaining VM-exit information fields of the
current VMCS (including IDT-vectoring information, in particular) are
stale.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
After a successful VM-entry, RFLAGS is cleared, with the exception of
bit 1, which is always set. This is handled by load_vmcs12_host_state.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
For example, the following could occur, making us miss a wakeup:
CPU0 CPU1
kvm_vcpu_block kvm_mips_comparecount_func
[L] swait_active(&vcpu->wq)
[S] prepare_to_swait(&vcpu->wq)
[L] if (!kvm_vcpu_has_pending_timer(vcpu))
schedule() [S] queue_timer_int(vcpu)
Ensure that the swait_active() check is not hoisted over the interrupt.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Particularly because kvmppc_fast_vcpu_kick_hv() is a callback,
ensure that we properly serialize wq active checks in order to
avoid potentially missing a wakeup due to racing with the waiter
side.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This is a generic call and can be suceptible to races
in reading the wq task_list while another task is adding
itself to the list. Add a full barrier by using the
swq_has_sleeper() helper.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
During code inspection, the following potential race was seen:
CPU0 CPU1
kvm_async_pf_task_wait apf_task_wake_one
[L] swait_active(&n->wq)
[S] prepare_to_swait(&n.wq)
[L] if (!hlist_unhahed(&n.link))
schedule() [S] hlist_del_init(&n->link);
Properly serialize swait_active() checks such that a wakeup is
not missed.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
A comment might serve future readers.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
... as we've got the new helper now. This caller already
does the right thing, hence no changes in semantics.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Which is the equivalent of what we have in regular waitqueues.
I'm not crazy about the name, but this also helps us get both
apis closer -- which iirc comes originally from the -net folks.
We also duplicate the comments for the lockless swait_active(),
from wait.h. Future users will make use of this interface.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The value of the guest_irq argument to vmx_update_pi_irte() is
ultimately coming from a KVM_IRQFD API call. Do not BUG() in
vmx_update_pi_irte() if the value is out-of bounds. (Especially,
since KVM as a whole seems to hang after that.)
Instead, print a message only once if we find that we don't have a
route for a certain IRQ (which can be out-of-bounds or within the
array).
This fixes CVE-2017-1000252.
Fixes: efc644048ecde54 ("KVM: x86: Update IRTE for posted-interrupts")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
We cannot add routes for gsi values >= KVM_MAX_IRQ_ROUTES -- see
kvm_set_irq_routing(). Hence, there is no sense in accepting them
via KVM_IRQFD. Prevent them from entering the system in the first
place.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Mainline crashes as follows when running nios2 images.
On node 0 totalpages: 65536
free_area_init_node: node 0, pgdat c8408fa0, node_mem_map c8726000
Normal zone: 512 pages used for memmap
Normal zone: 0 pages reserved
Normal zone: 65536 pages, LIFO batch:15
Unable to handle kernel NULL pointer dereference at virtual address 00000000
ea = c8003cb0, ra = c81cbf40, cause = 15
Kernel panic - not syncing: Oops
Problem is seen because get_cycles() is called before the timer it depends
on is initialized. Returning 0 in that situation fixes the problem.
Fixes: 33d72f3822d7 ("init/main.c: extract early boot entropy from the ..")
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Allow earlycon to be used on the JTAG UART present in the 3c120 GHRD.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
|
|
If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.
This fixes CVE-2017-12154.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This reverts commit 40f11adc7cd9281227f0a6a627d966dd0a5f0cd9.
Jens found that iwlwifi firmware loading failed on a Lenovo X1 Carbon,
gen4:
iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-34.ucode failed with error -2
iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-33.ucode failed with error -2
iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-32.ucode failed with error -2
iwlwifi 0000:04:00.0: loaded firmware version 31.532993.0 op_mode iwlmvm
iwlwifi 0000:04:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208
...
iwlwifi 0000:04:00.0: Failed to load firmware chunk!
iwlwifi 0000:04:00.0: Could not load the [0] uCode section
iwlwifi 0000:04:00.0: Failed to start INIT ucode: -110
iwlwifi 0000:04:00.0: Failed to run INIT ucode: -110
He bisected it to 40f11adc7cd9 ("PCI: Avoid race while enabling upstream
bridges"). Revert that commit to fix the regression.
Link: http://lkml.kernel.org/r/4bcbcbc1-7c79-09f0-5071-bc2f53bf6574@kernel.dk
Fixes: 40f11adc7cd9 ("PCI: Avoid race while enabling upstream bridges")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Srinath Mannam <srinath.mannam@broadcom.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Luca Coelho <luca@coelho.fi>
CC: Johannes Berg <johannes@sipsolutions.net>
CC: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
This patch constifies the path argument to kernel_read_file_from_path().
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This fixes the emulation of the dcbz instruction in the alignment
interrupt handler. The error was that we were comparing just the
instruction type field of op.type rather than the whole thing,
and therefore the comparison "type != CACHEOP + DCBZ" was always
true.
Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Tested-by: Michal Sojka <sojkam1@fel.cvut.cz>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The last firmware change for the in-kernel firmware source code was back
in 2013. Everyone has been relying on the out-of-tree linux-firmware
package for a long long time.
So let's drop it, it's baggage we don't need to keep dragging around
(and having to fix random kbuild issues over time...)
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|