| Commit message (Collapse) | Author | Files | Lines |
|
The ioctl IOC_LIBCFS_PING_TEST has not been used in ages. The recent
nidstring changes which moved all the nidstring operations from libcfs
to the LNet layer but this ioctl code was still using an nidstring
operation that was causing a circular dependency loop between libcfs and
LNet.
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
we already zero it on outermost set_nameidata(), so initialization in
path_init() is pointless and wrong. The same DoS exists on pre-4.2
kernels, but there a slightly different fix will be needed.
Cc: stable@vger.kernel.org # v4.2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
[Al Viro] The bug is in being too enthusiastic about optimizing ->setattr()
away - instead of "copy verbatim with metadata" + "chmod/chown/utimes"
(with the former being always safe and the latter failing in case of
insufficient permissions) it tries to combine these two. Note that copyup
itself will have to do ->setattr() anyway; _that_ is where the elevated
capabilities are right. Having these two ->setattr() (one to set verbatim
copy of metadata, another to do what overlayfs ->setattr() had been asked
to do in the first place) combined is where it breaks.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
When restarting a syscall with regs->ax == -ERESTART_RESTARTBLOCK,
regs->ax is assigned to a restart_syscall number. For x32 tasks, this
syscall number must have __X32_SYSCALL_BIT set, otherwise it will be
an x86_64 syscall number instead of a valid x32 syscall number. This
issue has been there since the introduction of x32.
Reported-by: strace/tests/restart_syscall.test
Reported-and-tested-by: Elvira Khabirova <lineprinter0@gmail.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Elvira Khabirova <lineprinter0@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20151130215436.GA25996@altlinux.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
MPX decodes instructions in order to tell which bounds register
was violated. Part of this decoding involves looking at the "REX
prefix" which is a special instrucion prefix used to retrofit
support for new registers in to old instructions.
The X86_REX_*() macros are defined to return actual bit values:
#define X86_REX_R(rex) ((rex) & 4)
*not* boolean values. However, the MPX code was checking for
them like they were booleans. This might have led to us
mis-decoding the "REX prefix" and giving false information out to
userspace about bounds violations. X86_REX_B() actually is bit 1,
so this is really only broken for the X86_REX_X() case.
Fix the conditionals up to tolerate the non-boolean values.
Fixes: fcc7ffd67991 "x86, mpx: Decode MPX instruction to get bound violation information"
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: Dave Hansen <dave@sr71.net>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20151201003113.D800C1E0@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
commit 4dfd6486 "drm: Use vblank timestamps to guesstimate how many
vblanks were missed" introduced in Linux 4.4-rc1 makes the drm core
more fragile to drivers which don't update hw vblank counters and
vblank timestamps in sync with firing of the vblank irq and
essentially at leading edge of vblank.
This exposed a problem with radeon-kms/amdgpu-kms which do not
satisfy above requirements:
The vblank irq fires a few scanlines before start of vblank, but
programmed pageflips complete at start of vblank and
vblank timestamps update at start of vblank, whereas the
hw vblank counter increments only later, at start of vsync.
This leads to problems like off by one errors for vblank counter
updates, vblank counters apparently going backwards or vblank
timestamps apparently having time going backwards. The net result
is stuttering of graphics in games, or little hangs, as well as
total failure of timing sensitive applications.
See bug #93147 for an example of the regression on Linux 4.4-rc:
https://bugs.freedesktop.org/show_bug.cgi?id=93147
This patch tries to align all above events better from the
viewpoint of the drm core / of external callers to fix the problem:
1. The apparent start of vblank is shifted a few scanlines earlier,
so the vblank irq now always happens after start of this extended
vblank interval and thereby drm_update_vblank_count() always samples
the updated vblank count and timestamp of the new vblank interval.
To achieve this, the reporting of scanout positions by
radeon_get_crtc_scanoutpos() now operates as if the vblank starts
radeon_crtc->lb_vblank_lead_lines before the real start of the hw
vblank interval. This means that the vblank timestamps which are based
on these scanout positions will now update at this earlier start of
vblank.
2. The driver->get_vblank_counter() function will bump the returned
vblank count as read from the hw by +1 if the query happens after
the shifted earlier start of the vblank, but before the real hw increment
at start of vsync, so the counter appears to increment at start of vblank
in sync with the timestamp update.
3. Calls from vblank irq-context and regular non-irq calls are now
treated identical, always simulating the shifted vblank start, to
avoid inconsistent results for queries happening from vblank irq vs.
happening from drm_vblank_enable() or vblank_disable_fn().
4. The radeon_flip_work_func will delay mmio programming a pageflip until
the start of the real vblank iff it happens to execute inside the shifted
earlier start of the vblank, so pageflips now also appear to execute at
start of the shifted vblank, in sync with vblank counter and timestamp
updates. This to avoid some races between updates of vblank count and
timestamps that are used for swap scheduling and pageflip execution which
could cause pageflips to execute before the scheduled target vblank.
The lb_vblank_lead_lines "fudge" value is calculated as the size of
the display controllers line buffer in scanlines for the given video
mode: Vblank irq's are triggered by the line buffer logic when the line
buffer refill for a video frame ends, ie. when the line buffer source read
position enters the hw vblank. This means that a vblank irq could fire at
most as many scanlines before the current reported scanout position of the
crtc timing generator as the number of scanlines the line buffer can
maximally hold for a given video mode.
This patch has been successfully tested on a RV730 card with DCE-3 display
engine and on a evergreen card with DCE-4 display engine, in single-display
and dual-display configuration, with different video modes.
A similar patch is needed for amdgpu-kms to fix the same problem.
Limitations:
- Maybe replace the udelay() in the flip_work_func() by a suitable
usleep_range() for a bit better efficiency? Will try that.
- Line buffer sizes in pixels are hard-coded on < DCE-4 to a value
i just guessed to be high enough to work ok, lacking info on the true
sizes atm.
Probably fixes: fdo#93147
Port of Mario's radeon fix to amdgpu.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(v1) Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
(v2) Refine amdgpu_flip_work_func() for better efficiency.
In amdgpu_flip_work_func, replace the busy waiting udelay(5)
with event lock held by a more performance and energy efficient
usleep_range() until at least predicted true start of hw vblank,
with some slack for scheduler happiness. Release the event lock
during waits to not delay other outputs in doing their stuff, as
the waiting can last up to 200 usecs in some cases.
Also small fix to code comment and formatting in that function.
(v2) Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
(v3) Fix crash in crtc disabled case
|
|
commit 4dfd6486 "drm: Use vblank timestamps to guesstimate how many
vblanks were missed" introduced in Linux 4.4-rc1 makes the drm core
more fragile to drivers which don't update hw vblank counters and
vblank timestamps in sync with firing of the vblank irq and
essentially at leading edge of vblank.
This exposed a problem with radeon-kms/amdgpu-kms which do not
satisfy above requirements:
The vblank irq fires a few scanlines before start of vblank, but
programmed pageflips complete at start of vblank and
vblank timestamps update at start of vblank, whereas the
hw vblank counter increments only later, at start of vsync.
This leads to problems like off by one errors for vblank counter
updates, vblank counters apparently going backwards or vblank
timestamps apparently having time going backwards. The net result
is stuttering of graphics in games, or little hangs, as well as
total failure of timing sensitive applications.
See bug #93147 for an example of the regression on Linux 4.4-rc:
https://bugs.freedesktop.org/show_bug.cgi?id=93147
This patch tries to align all above events better from the
viewpoint of the drm core / of external callers to fix the problem:
1. The apparent start of vblank is shifted a few scanlines earlier,
so the vblank irq now always happens after start of this extended
vblank interval and thereby drm_update_vblank_count() always samples
the updated vblank count and timestamp of the new vblank interval.
To achieve this, the reporting of scanout positions by
radeon_get_crtc_scanoutpos() now operates as if the vblank starts
radeon_crtc->lb_vblank_lead_lines before the real start of the hw
vblank interval. This means that the vblank timestamps which are based
on these scanout positions will now update at this earlier start of
vblank.
2. The driver->get_vblank_counter() function will bump the returned
vblank count as read from the hw by +1 if the query happens after
the shifted earlier start of the vblank, but before the real hw increment
at start of vsync, so the counter appears to increment at start of vblank
in sync with the timestamp update.
3. Calls from vblank irq-context and regular non-irq calls are now
treated identical, always simulating the shifted vblank start, to
avoid inconsistent results for queries happening from vblank irq vs.
happening from drm_vblank_enable() or vblank_disable_fn().
4. The radeon_flip_work_func will delay mmio programming a pageflip until
the start of the real vblank iff it happens to execute inside the shifted
earlier start of the vblank, so pageflips now also appear to execute at
start of the shifted vblank, in sync with vblank counter and timestamp
updates. This to avoid some races between updates of vblank count and
timestamps that are used for swap scheduling and pageflip execution which
could cause pageflips to execute before the scheduled target vblank.
The lb_vblank_lead_lines "fudge" value is calculated as the size of
the display controllers line buffer in scanlines for the given video
mode: Vblank irq's are triggered by the line buffer logic when the line
buffer refill for a video frame ends, ie. when the line buffer source read
position enters the hw vblank. This means that a vblank irq could fire at
most as many scanlines before the current reported scanout position of the
crtc timing generator as the number of scanlines the line buffer can
maximally hold for a given video mode.
This patch has been successfully tested on a RV730 card with DCE-3 display
engine and on a evergreen card with DCE-4 display engine, in single-display
and dual-display configuration, with different video modes.
A similar patch is needed for amdgpu-kms to fix the same problem.
Limitations:
- Line buffer sizes in pixels are hard-coded on < DCE-4 to a value
i just guessed to be high enough to work ok, lacking info on the true
sizes atm.
Fixes: fdo#93147
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Harry Wentland <Harry.Wentland@amd.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
(v1) Tested-by: Dave Witbrodt <dawitbro@sbcglobal.net>
(v2) Refine radeon_flip_work_func() for better efficiency:
In radeon_flip_work_func, replace the busy waiting udelay(5)
with event lock held by a more performance and energy efficient
usleep_range() until at least predicted true start of hw vblank,
with some slack for scheduler happiness. Release the event lock
during waits to not delay other outputs in doing their stuff, as
the waiting can last up to 200 usecs in some cases.
Retested on DCE-3 and DCE-4 to verify it still works nicely.
(v2) Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
HPD signals on DVI ports can be fired off before the pins required for
DDC probing actually make contact, due to the pins for HPD making
contact first. This results in a HPD signal being asserted but DDC
probing failing, resulting in hotplugging occasionally failing.
This is somewhat rare on most cards (depending on what angle you plug
the DVI connector in), but on some cards it happens constantly. The
Radeon R5 on the machine used for testing this patch for instance, runs
into this issue just about every time I try to hotplug a DVI monitor and
as a result hotplugging almost never works.
Rescheduling the hotplug work for a second when we run into an HPD
signal with a failing DDC probe usually gives enough time for the rest
of the connector's pins to make contact, and fixes this issue.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
there is a protection fault about freed list when OCL test.
add a spin lock to protect it.
v2: drop changes in vm_fini
Signed-off-by: JimQu <jim.qu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
|
|
VM_CONTEXT*_PAGE_TABLE_END_ADDR" v2
The gtt_end is already inclusive, we don't need to subtract one here.
v2 (chk): keep the fix for the VM code, cause here it really applies.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Anatoli Antonovitch <anatoli.antonovitch@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
No need for a GEM reference here.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
No need for the GEM reference here.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Not necessary for VRAM.
v2: no need to check if ttm is NULL.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Commit e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's
uncachedness") modified the logic to test whether a HYP or stage-2
mapping needs flushing, from [incorrectly] interpreting the page table
attributes to [incorrectly] checking whether the PFN that backs the
mapping is covered by host system RAM. The PFN number is part of the
output of the translation, not the input, so we have to use pte_pfn()
on the contents of the PTE, not __phys_to_pfn() on the HYP virtual
address or stage-2 intermediate physical address.
Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
Cc: stable@vger.kernel.org
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Using oldstyle vcpu_reg() accessor is proven to be inappropriate and
unsafe on ARM64. This patch converts the rest of use cases to new
accessors and completely removes vcpu_reg() on ARM64.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
System register accesses also use zero register for Rt == 31, and
therefore using it will also result in getting SP value instead. This
patch makes them also using new accessors, introduced by the previous
patch. Since register value is no longer directly associated with storage
inside vCPU context structure, we introduce a dedicated storage for it in
struct sys_reg_params.
This refactor also gets rid of "massive hack" in kvm_handle_cp_64().
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Further rework is going to introduce a dedicated storage for transfer
register value in struct sys_reg_params. Before doing this we have to
remove 'const' modifiers from it in all accessor functions and their
callers.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
On ARM64 register index of 31 corresponds to both zero register and SP.
However, all memory access instructions, use ZR as transfer register. SP
is used only as a base register in indirect memory addressing, or by
register-register arithmetics, which cannot be trapped here.
Correct emulation is achieved by introducing new register accessor
functions, which can do special handling for reg_num == 31. These new
accessors intentionally do not rely on old vcpu_reg() on ARM64, because
it is to be removed. Since the affected code is shared by both ARM
flavours, implementations of these accessors are also added to ARM32 code.
This patch fixes setting MMIO register to a random value (actually SP)
instead of zero by something like:
*((volatile int *)reg) = 0;
compilers tend to generate "str wzr, [xx]" here
[Marc: Fixed 32bit splat]
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Commit 4e752f0ab0e8 ("rbd: access snapshot context and mapping size
safely") moved ceph_get_snap_context() out of rbd_img_request_create()
and into rbd_queue_workfn(), adding a ceph_put_snap_context() to the
error path in rbd_queue_workfn(). However, rbd_img_request_create()
consumes a ref on snapc, so calling ceph_put_snap_context() after
a successful rbd_img_request_create() leads to an extra put. Fix it.
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
|
|
Oleg noticed that its possible to falsely observe p->on_cpu == 0 such
that we'll prematurely continue with the wakeup and effectively run p on
two CPUs at the same time.
Even though the overlap is very limited; the task is in the middle of
being scheduled out; it could still result in corruption of the
scheduler data structures.
CPU0 CPU1
set_current_state(...)
<preempt_schedule>
context_switch(X, Y)
prepare_lock_switch(Y)
Y->on_cpu = 1;
finish_lock_switch(X)
store_release(X->on_cpu, 0);
try_to_wake_up(X)
LOCK(p->pi_lock);
t = X->on_cpu; // 0
context_switch(Y, X)
prepare_lock_switch(X)
X->on_cpu = 1;
finish_lock_switch(Y)
store_release(Y->on_cpu, 0);
</preempt_schedule>
schedule();
deactivate_task(X);
X->on_rq = 0;
if (X->on_rq) // false
if (t) while (X->on_cpu)
cpu_relax();
context_switch(X, ..)
finish_lock_switch(X)
store_release(X->on_cpu, 0);
Avoid the load of X->on_cpu being hoisted over the X->on_rq load.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Explain how the control dependency and smp_rmb() end up providing
ACQUIRE semantics and pair with smp_store_release() in
finish_lock_switch().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
/proc/stats shows invalid gtime when the thread is running in guest.
When vtime accounting is not enabled, we cannot get a valid delta.
The delta is calculated with now - tsk->vtime_snap, but tsk->vtime_snap
is only updated when vtime accounting is runtime enabled.
This patch makes task_gtime() just return gtime without computing the
buggy non-existing tickless delta when vtime accounting is not enabled.
Use context_tracking_is_enabled() to check if vtime is accounting on
some cpu, in which case only we need to check the tickless delta. This
way we fix the gtime value regression on machines not running nohz full.
The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.
I ran and stop a busy loop in VM and see the gtime in host.
Dump the 43rd field which shows the gtime in every second:
# while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
S 4348
R 7064566
R 7064766
R 7064967
R 7065168
S 4759
S 4759
During running busy loop, it returns large value.
After applying this patch, we can see right gtime.
# while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
S 5338
R 5365
R 5465
R 5566
R 5666
S 5726
S 5726
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447948054-28668-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
root_domain::rto_mask allocated through alloc_cpumask_var()
contains garbage data, this may cause problems. For instance,
When doing pull_rt_task(), it may do useless iterations if
rto_mask retains some extra garbage bits. Worse still, this
violates the isolated domain rule for clustered scheduling
using cpuset, because the tasks(with all the cpus allowed)
belongs to one root domain can be pulled away into another
root domain.
The patch cleans the garbage by using zalloc_cpumask_var()
instead of alloc_cpumask_var() for root_domain::rto_mask
allocation, thereby addressing the issues.
Do the same thing for root_domain's other cpumask memembers:
dlo_mask, span, and online.
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1449057179-29321-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Because wakeups can (fundamentally) be late, a task might not be in
the expected state. Therefore testing against a task's state is racy,
and can yield false positives.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: oleg@redhat.com
Fixes: 9067ac85d533 ("wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task")
Link: http://lkml.kernel.org/r/1448933660-23082-1-git-send-email-sasha.levin@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Vladimir reported getting RCU stall warnings and bisected it back to
commit:
743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")
That commit inadvertently reversed the calls to schedule() and signal_pending(),
thereby not handling the case where the signal receives while we sleep.
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mark.rutland@arm.com
Cc: neilb@suse.de
Cc: oleg@redhat.com
Fixes: 743162013d40 ("sched: Remove proliferation of wait_on_bit() action functions")
Fixes: cbbce8220949 ("SCHED: add some "wait..on_bit...timeout()" interfaces.")
Link: http://lkml.kernel.org/r/20151201130404.GL3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Recent PAT patchset has caused issue on 32-bit PAE machines:
page:eea45000 count:0 mapcount:-128 mapping: (null) index:0x0 flags: 0x40000000()
page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) < 0)
------------[ cut here ]------------
kernel BUG at /home/build/linux-boris/mm/huge_memory.c:1485!
invalid opcode: 0000 [#1] SMP
[...]
Call Trace:
unmap_single_vma
? __wake_up
unmap_vmas
unmap_region
do_munmap
vm_munmap
SyS_munmap
do_fast_syscall_32
? __do_page_fault
sysenter_past_esp
Code: ...
EIP: [<c11bde80>] zap_huge_pmd+0x240/0x260 SS:ESP 0068:f6459d98
The problem is in pmd_pfn_mask() and pmd_flags_mask(). These
helpers use PMD_PAGE_MASK to calculate resulting mask.
PMD_PAGE_MASK is 'unsigned long', not 'unsigned long long' as
phys_addr_t is on 32-bit PAE (ARCH_PHYS_ADDR_T_64BIT). As a
result, the upper bits of resulting mask get truncated.
pud_pfn_mask() and pud_flags_mask() aren't problematic since we
don't have PUD page table level on 32-bit systems, but it's
reasonable to keep them consistent with PMD counterpart.
Introduce PHYSICAL_PMD_PAGE_MASK and PHYSICAL_PUD_PAGE_MASK in
addition to existing PHYSICAL_PAGE_MASK and reworks helpers to
use them.
Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[ Fix -Woverflow warnings from the realmode code. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jürgen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: linux-mm <linux-mm@kvack.org>
Fixes: f70abb0fc3da ("x86/asm: Fix pud/pmd interfaces to handle large PAT bit")
Link: http://lkml.kernel.org/r/1448878233-11390-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We've had human readable connector status change debug logging since
commit ed7951dc13aad4a14695ec8122e9f0e2ef25d39e
Author: Lespiau, Damien <damien.lespiau@intel.com>
Date: Fri May 10 12:36:42 2013 +0000
drm: Make the HPD status updates debug logs more readable
but
commit 162b6a57ac50eec236530a16c071ffa50e87362a
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Jan 21 08:45:21 2015 +0100
drm/probe-helper: don't lose hotplug event
added a new one with just the numbers. Fix it.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449144003-2877-1-git-send-email-jani.nikula@intel.com
|
|
Apparently pre-nv50 pageflip events happen before the actual vblank
period. Therefore that functionality got semi-disabled in
commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
Author: Mario Kleiner <mario.kleiner.de@gmail.com>
Date: Tue May 13 00:42:08 2014 +0200
drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.
Unfortunately that hack got uprooted in
commit cc1ef118fc099295ae6aabbacc8af94d8d8885eb
Author: Thierry Reding <treding@nvidia.com>
Date: Wed Aug 12 17:00:31 2015 +0200
drm/irq: Make pipe unsigned and name consistent
Triggering a warning when trying to sample the vblank timestamp for a
non-existing pipe. There's a few ways to fix this:
- Open-code the old behaviour, which just enshrines this slight
breakage of the userspace ABI.
- Revert Mario's commit and again inflict broken timestamps, again not
pretty.
- Fix this for real by delaying the pageflip TS until the next vblank
interrupt, thereby making it accurate.
This patch implements the third option. Since having a page flip
interrupt that happens when the pageflip gets armed and not when it
completes in the next vblank seems to be fairly common (older i915 hw
works very similarly) create a new helper to arm vblank events for
such drivers.
v2 (Mario Kleiner):
- Fix function prototypes in drmP.h
- Add missing vblank_put() for pageflip completion without
pageflip event.
- Initialize sequence number for queued pageflip event to avoid
trouble in drm_handle_vblank_events().
- Remove dead code and spelling fix.
v3 (Mario Kleiner):
- Add a signed-off-by and cc stable tag per Ilja's advice.
v4 (Thierry Reding):
- Fix kerneldoc typo, discovered by Michel Dänzer
- Rearrange tags and changelog
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=106431
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: stable@vger.kernel.org # v4.3
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
A client calling drmSetMaster() using a file descriptor that was opened
when another client was master would inherit the latter client's master
object and all its authenticated clients.
This is unwanted behaviour, and when this happens, instead allocate a
brand new master object for the client calling drmSetMaster().
Fixes a BUG() throw in vmw_master_set().
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
It looks like these meant to be unreffing the
of_parse_phandle_with_args() node, since the error paths above it
don't do of_node_put. That function returns a new ref in pd_args.np,
though, not a new ref on dev->of_node. Also, it would have leaked the
ref in the success case.
Fixes "ERROR: Bad of_node_put()" on bcm2835 in the -EPROBE_DEFER case.
Fixes: aa42240ab254 (PM / Domains: Add generic OF-based PM domain look-up)
Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Cc: 3.18+ <stable@vger.kernel.org> # 3.18+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
It is possible to address another chip on same MDIO bus. The case is
correctly handled for media advertising. It is taken into account
only if mii_data->phy_id == phydev->addr. However, this condition
was missing for reset case.
Signed-off-by: Jérôme Pouiller <jezz@sysmic.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
mc_list mac address filters. Before the patch, uc_list is not
setup again after chip reset (such as ethtool ring size change)
and macvlans don't work any more after that.
Modify bnxt_cfg_rx_mode() to return error codes appropriately so
that the init chip sequence can detect any failures.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For PF, the bp->pf.mac_addr always holds the permanent MAC
addr assigned by the HW. For VF, the bp->vf.mac_addr always
holds the administrator assigned VF MAC addr. The random
generated VF MAC addr should never get stored to bp->vf.mac_addr.
This way, when the VF wants to change the MAC address, we can tell
if the adminstrator has already set it and disallow the VF from
changing it.
v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The existing ndo_set_mac_address only copies the new MAC addr
and didn't set the new MAC addr to the HW. The correct way is
to delete the existing default MAC filter from HW and add
the new one. Because of RFS filters are also dependent on the
default mac filter l2 context, the driver must go thru
close_nic() to delete the default MAC and RFS filters, then
open_nic() to set the default MAC address to HW.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the driver is used on an ARM platform with SPARSE_IRQ defined,
semantics of NR_IRQS is different (minimal value of virtual irqs) and
by default it is set to 16, see arch/arm/include/asm/irq.h.
This value may be less than the actual number of virtual irqs, which
may break the driver initialization. The check removal allows to use
the driver on such a platform, and, if irq controller driver works
correctly, the check is not needed on legacy platforms.
Fixes a runtime problem:
lpc-eth 31060000.ethernet: error getting resources.
lpc_eth: lpc-eth: not found (-6).
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
devices.
One problem is that it updates sch->q.qlen and sch->qstats.drops
on the mq/mqprio root qdisc, while it should not : Daniele
reported underflows errors :
[ 681.774821] PAX: sch->q.qlen: 0 n: 1
[ 681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
[ 681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G O 4.2.6.201511282239-1-grsec #1
[ 681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
[ 681.774956] ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
[ 681.774959] ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
[ 681.774960] ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
[ 681.774962] Call Trace:
[ 681.774967] [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
[ 681.774970] [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
[ 681.774972] [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
[ 681.774976] [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
[ 681.774978] [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
[ 681.774980] [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
[ 681.774983] [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
[ 681.774985] [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
[ 681.774987] [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
[ 681.774989] [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
[ 681.774991] [<ffffffffa9089560>] ? sort_range+0x40/0x40
[ 681.774992] [<ffffffffa9085fe4>] kthread+0xe4/0x100
[ 681.774994] [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
[ 681.774995] [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.
A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.
Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.
Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.
A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.
Reported-by: Daniele Fucini <dfucini@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The non-PCI builds of the O day test project are failing:
On Thu, 2015-12-03 at 05:02 +0800, kbuild test robot wrote:
> warning: (SCSI_MPT2SAS) selects SCSI_MPT3SAS which has unmet direct
> dependencies (SCSI_LOWLEVEL && PCI && SCSI)
The problem is that select and depend don't interact because Kconfig doesn't
have a SAT solver, so depend picks up dependencies and select does onward
selects, but select doesn't pick up dependencies. To fix this, we need to add
the correct dependencies to the MPT2SAS option like this.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: b840c3627b6f4f856b333a14a72f8ed86da2f86c
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
|
When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.
If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a5344f ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.
The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 52ad353a5344f "(igmp: fix the problem when mc leave group")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.
We need to call inet6_destroy_sock() to properly release
inet6 specific fields.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:
Load immediate to register
Store register
Signed-off-by: Yang Shi <yang.shi@linaro.org>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While testing the np->opt RCU conversion, I found that UDP/IPv6 was
using a mixture of xchg() and sk_dst_lock to protect concurrent changes
to sk->sk_dst_cache, leading to possible corruptions and crashes.
ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
way to fix the mess is to remove sk_dst_lock completely, as we did for
IPv4.
__ip6_dst_store() and ip6_dst_store() share same implementation.
sk_setup_caps() being called with socket lock being held or not,
we have to use sk_dst_set() instead of __sk_dst_set()
Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
in ip6_dst_store() before the sk_setup_caps(sk, dst) call.
This is because ip6_dst_store() can be called from process context,
without any lock held.
As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
from another cpu doing a concurrent ip6_dst_store()
Doing the dst dereference before doing the install is needed to make
sure no use after free would trigger.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch completes the work I did in commit 45f6fad84cc3
("ipv6: add complete rcu protection around np->opt"), as I missed
sctp part.
This simply makes sure np->opt is used with proper RCU locking
and accessors.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This can happen when we run out of encoders for a multi-crtc modeset,
or also when userspace is silly and tries to clone multiple connectors
that need the same encoder on the same crtc.
Reported-and-Tested-and-Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1449136154-11588-1-git-send-email-daniel.vetter@ffwll.ch
|
|
Proxy entries could have null pointer to net-device.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Fixes: 84920c1420e2 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dmitry Vyukov reported that the user could trigger a kernel warning by
using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that
value directly affects the value used as a kmalloc() parameter.
This patch thus switches the allocation flags from all user-controllable
kmalloc size to GFP_USER to put some more restrictions on it and also
disables the warn, as they are not necessary.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
They don't need to be any bigger than that and with this we start a new
bitfield for tracking association runtime stuff, like zero window
situation.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch addresses multiple problems :
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.
Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())
This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For large map->value_size the user space can trigger memory allocation warnings like:
WARNING: CPU: 2 PID: 11122 at mm/page_alloc.c:2989
__alloc_pages_nodemask+0x695/0x14e0()
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82743b56>] dump_stack+0x68/0x92 lib/dump_stack.c:50
[<ffffffff81244ec9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
[<ffffffff812450f9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
[< inline >] __alloc_pages_slowpath mm/page_alloc.c:2989
[<ffffffff81554e95>] __alloc_pages_nodemask+0x695/0x14e0 mm/page_alloc.c:3235
[<ffffffff816188fe>] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
[< inline >] alloc_pages include/linux/gfp.h:451
[<ffffffff81550706>] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
[<ffffffff815a1c89>] kmalloc_order+0x19/0x60 mm/slab_common.c:1007
[<ffffffff815a1cef>] kmalloc_order_trace+0x1f/0xa0 mm/slab_common.c:1018
[< inline >] kmalloc_large include/linux/slab.h:390
[<ffffffff81627784>] __kmalloc+0x234/0x250 mm/slub.c:3525
[< inline >] kmalloc include/linux/slab.h:463
[< inline >] map_update_elem kernel/bpf/syscall.c:288
[< inline >] SYSC_bpf kernel/bpf/syscall.c:744
To avoid never succeeding kmalloc with order >= MAX_ORDER check that
elem->value_size and computed elem_size are within limits for both hash and
array type maps.
Also add __GFP_NOWARN to kmalloc(value_size | elem_size) to avoid OOM warnings.
Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size <= 512,
so keep those kmalloc-s as-is.
Large value_size can cause integer overflows in elem_size and map.pages
formulas, so check for that as well.
Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|