| Commit message (Collapse) | Author | Files | Lines |
|
Try to get irq_desc on the same node as create_irq_nr().
[ Impact: optimization, make HT IRQs more NUMA-aware ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F655B6.8020109@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Try to get irq_desc on the home node in create_irq_nr().
v2: don't check if we can move it when sparse_irq is not used
v3: use move_irq_des, if that node is not what we want
[ Impact: optimization, make MSI IRQ descriptors more NUMA aware ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F6559F.7070005@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Make actual use of the device parameter passed down to
io_apic_set_pci_routing() - to have the IRQ descriptor
on the home node of the device.
If no device has been passed down, we assume it's a platform
device and use the boot node ID for the IRQ descriptor.
[ Impact: optimization, make IO-APIC code more NUMA aware ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F6557E.3080101@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
We want to use dev_to_node() later on, to be aware of the 'home node'
of the GSI in question.
[ Impact: cleanup, prepare the IRQ code to be more NUMA aware ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Len Brown <lenb@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Len Brown <lenb@kernel.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
LKML-Reference: <49F65560.20904@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This simplifies the node awareness of the code. All our allocators
only deal with a NUMA node ID locality not with CPU ids anyway - so
there's no need to maintain (and transform) a CPU id all across the
IRq layer.
v2: keep move_irq_desc related
[ Impact: cleanup, prepare IRQ code to be NUMA-aware ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
LKML-Reference: <49F65536.2020300@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
irq_set_affinity() and move_masked_irq() try to assign affinity
before calling chip set_affinity(). Some archs are assigning it
in ->set_affinity() again.
We do something like:
cpumask_cpy(desc->affinity, mask);
desc->chip->set_affinity(mask);
But in the failure path, affinity should not be touched - otherwise
we'll end up with a different affinity mask despite the failure to
migrate the IRQ.
So try to update the afffinity only if set_affinity returns with 0.
Also call irq_set_thread_affinity accordingly.
v2: update after "irq, x86: Remove IRQ_DISABLED check in process context IRQ move"
v3: according to Ingo, change set_affinity() in irq_chip should return int.
v4: update comments by removing moving irq_desc code.
[ Impact: fix /proc/irq/*/smp_affinity setting corner case bug ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F65509.60307@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
according to Ingo, change set_affinity() in irq_chip should return int,
because that way we can handle failure cases in a much cleaner way, in
the genirq layer.
v2: fix two typos
[ Impact: extend API ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-arch@vger.kernel.org
LKML-Reference: <49F654E9.4070809@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The original feature of migrating irq_desc dynamic was too fragile
and was causing problems: it caused crashes on systems with lots of
cards with MSI-X when user-space irq-balancer was enabled.
We now have new patches that create irq_desc according to device
numa node. This patch removes the leftover bits of the dynamic balancer.
[ Impact: remove dead code ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49F654AF.8000808@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
CPUMASKS_OFFSTACK is not defined anywhere (it is CPUMASK_OFFSTACK).
It is a typo and init_allocate_desc_masks() is called before it set
affinity to all cpus...
Split init_alloc_desc_masks() into all_desc_masks() and init_desc_masks().
Also use CPUMASK_OFFSTACK in alloc_desc_masks().
[ Impact: fix smp_affinity copying/setup when moving irq_desc between CPUs ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
LKML-Reference: <49F6546E.3040406@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Stop gcc from generating uninitialised variable warnings after BUG().
The problem is that FRV's call into its gdbstub appears to return (if
the function is marked noreturn, then the compiler is under no
obligation to pass it a return address, and so GDB won't know where the
bug happened).
To get around this, we make the do...while wrapper in _debug_bug_trap()
an endless loop from which there's no escape.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Wire up new system calls for the FRV arch (preadv and pwritev).
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ptrace_attach() needs task->cred_exec_mutex, not current->cred_exec_mutex.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
|
|
syscall_nr is presently defined as unsigned in the SH-5 pt_regs,
while the syscall restarting code wants it to be signed. Fix this
up, and bring it in line with the other SH parts.
Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
We will have systems with 2 and more sockets 8cores/2thread,
but we treat them as multi chassis - while they could have
a stable TSC domain.
Use DMI check instead.
[ Impact: do not turn possibly stable TSCs off incorrectly ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
LKML-Reference: <49F5532A.5000802@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
XAPIC_DEST_* is dupliicated to the one in apicdef.h
[ Impact: cleanup ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49F552D0.5050505@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The maple mouse driver currently in mainline is broken:
bash-3.1# modprobe maplemouse
[ 56.886378] input: Dreamcast Mouse as /devices/virtual/input/input3
[ 56.918379] Unable to handle kernel NULL pointer dereference at virtual address 00000004
[ 56.930543] pc = c003304e
[ 56.934973] *pde = 00000000
[ 56.944948] Oops: 0000 [#1]
[ 56.947867] Modules linked in: maplemouse(+)
[ 56.952353]
[ 56.953921] Pid : 1157, Comm: \0x09\0x09modprobe
[ 56.958021] CPU : 0 \0x09\0x09Not tainted (2.6.30-rc2-00130-g3e98f9f #1)
[ 56.958052]
[ 56.966567] PC is at dc_mouse_open+0xe/0x40 [maplemouse]
[ 56.972125] PR is at input_open_device+0x8a/0xc0
[ 56.976944] PC : c003304e SP : 8c88bdcc SR : 40008100 TEA : c0033834
[ 56.983854] R0 : 000006c4 R1 : 00000000 R2 : 40008101 R3 : 00000000
[ 56.990744] R4 : 8c8db800 R5 : c0033080 R6 : 00000005 R7 : 00000200
[ 56.997635] R8 : 8c8db800 R9 : 8c8dbe3c R10 : 00000000 R11 : 8c98881c
[ 57.004525] R12 : 8c8dbe64 R13 : 8ca50140 R14 : 8c88bdd4
[ 57.010063] MACH: 00000497 MACL: 00000348 GBR : 29674440 PR : 8c1b4d0a
[ 57.016939]
...
Here is a fix for this, keeping an open and close, so reducing
the load on the system when the mouse is not in use, and also properly
referencing the maple device buffer following the recent update.
Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Fix the problem that cannot work 29-bit mode when use sh7785lcr_defconfig.
Signed-off-by: Yoshihiro Shimoda <shimoda.yoshihiro@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This has the consequence of changing the section name use for head
code from ".text.head" to ".head.text". Since this commit changes all
users in the architecture, this change should be harmless.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Richard Henderson <rth@twiddle.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch is preparation for replacing all uses of ".head.text" or
".text.head" in the kernel with macros, so that the section name can
later be changed without having to touch a lot of the kernel.
Since some linker scripts do more complex things than referencing
HEAD_TEXT, we add a HEAD_TEXT_SECTION macro that just contains the
actual name.
I've defined HEAD_TEXT_SECTION in a new header,
include/linux/section-names.h, so that this section name only needs to
appear in one place. I anticipate creating similar macro structures
for a number of other section names.
The long-term goal here is to be able to change the kernel's magic
section names to those that are compatible with -ffunction-sections
-fdata-sections. This requires renaming all magic sections with names
of the form ".text.foo".
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The EXTENTS_FL flag should never be set on special files, but if it
is, don't bother trying to validate that the extents tree is valid,
since only files, directories, and non-fast symlinks will ever have an
extent data structure. We perhaps should flag the filesystem as being
corrupted if we see a special file (named pipes, device nodes, Unix
domain sockets, etc.) with the EXTENTS_FL flag, but e2fsck doesn't
currently check this case, so we'll just ignore this for now, since
it's harmless.
Without this fix, a special device with the extents flag is flagged as
an error by the kernel, so it is impossible to access or delete the
inode, but e2fsck doesn't see it as a problem, leading to
confused/frustrated users.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Commit c751085943362143f84346d274e0011419c84202 ("PM/Hibernate: Wait for
SCSI devices scan to complete during resume") added a call to
scsi_complete_async_scans() to software_resume(), so that it waited for
the SCSI scanning to complete, but the call was added at a wrong place.
Namely, it should have been added after wait_for_device_probe(), which
is called only if the image partition hasn't been specified yet. Also,
it's reasonable to check if the image partition is present and only wait
for the device probing and SCSI scanning to complete if it is not the
case.
Additionally, since noresume is checked right at the beginning of
software_resume() and the function returns immediately if it's set, it
doesn't make sense to check it once again later.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
RomFS should advance the destination buffer pointer when reading data from a
blockdev source (the data may be split over multiple blocks, each requiring its
own sb_read() call). Without this, all the data is copied to the beginning of
the output buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
romfs_lookup() should be using a routine akin to strcmp() on the backing store,
rather than one akin to strncmp(). If it uses the latter, it's liable to match
/bin/shutdown when looking up /bin/sh.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, although find_last_bit is EXPORTed, it is statically linked
with the kernel and is referenced only under CONFIG_SMP.
When CONFIG_SMP is undefined and find_last_bit is referenced only by
modules, linking fails with:
ERROR: "find_last_bit" [fs/nfs/nfs.ko] undefined!
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Adjust the CacheFiles documentation to use the correct names of the credential
pointers in task_struct.
The documentation was using names from the old versions of the credentials
patches.
Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The virtio-rng drivers checks for spurious callbacks. Since
callbacks can be implemented via shared interrupts (e.g. PCI) this
could lead to guest kernel oopses with lots of virtio devices.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Don't try to look at i_file_acl_high unless the INCOMPAT_64BIT feature
bit is set. The field is normally zero, but older versions of e2fsck
didn't automatically check to make sure of this, so in the spirit of
"be liberal in what you accept", don't look at i_file_acl_high unless
we are using a 64-bit filesystem.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
If the block containing external extended attributes (which is stored
in i_file_acl and i_file_acl_high) is larger than the on-disk
filesystem, the process which tried to access the extended attributes
will endlessly issue kernel printks complaining that
"__find_get_block_slow() failed", locking up that CPU until the system
is forcibly rebooted.
So when we read in the inode, make sure the i_file_acl value is legal,
and if not, flag the filesystem as being corrupted.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The release path for a disconnected device frees the object then unlocks
the mutex in the freed object...
Found by Dan Carpenter using Smatch
Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove my name and emailaddress from note in the source. Wincor Nixdorf
only has some ITE-chips on their mainboards, other chips are not
available for me for testing.
Signed-off-by: Niels de Vos <niels.devos@wincor-nixdorf.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Wrong types on IRQ handler
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Update the defconfig for the ASB2303 evaluation board.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Slow-work appears to delete its timer as soon as the first user
unregisters, even though other users could be active. At the same time, it
never seems to delete slow_work_oom_timer. Arrange for both to happen in
the shutdown path.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
write_lock(¤t->fs->lock) guarantees we can't wrongly miss
LSM_UNSAFE_SHARE, this is what we care about. Use rcu_read_lock()
instead of ->siglock to iterate over the sub-threads. We must see
all CLONE_THREAD|CLONE_FS threads which didn't pass exit_fs(), it
takes fs->lock too.
With or without this patch we can miss the freshly cloned thread
and set LSM_UNSAFE_SHARE, we don't care.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
[ Fixed lock/unlock typo - Hugh ]
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If do_execve() fails after check_unsafe_exec(), it clears fs->in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:
Two threads T1 and T2 and another process P, all share the same
->fs.
T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
->fs is shared, we set LSM_UNSAFE but not ->in_exec.
P exits and decrements fs->users.
T2 starts do_execve(), calls check_unsafe_exec(), now ->fs is not
shared, we set fs->in_exec.
T1 continues, open_exec(BAD_FILE) fails, we clear ->in_exec and
return to the user-space.
T1 does clone(CLONE_FS /* without CLONE_THREAD */).
T2 continues without LSM_UNSAFE_SHARE while ->fs is shared with
another process.
Change check_unsafe_exec() to return res = 1 if we set ->in_exec, and change
do_execve() to clear ->in_exec depending on res.
When do_execve() suceeds, it is safe to clear ->in_exec unconditionally.
It can be set only if we don't share ->fs with another process, and since
we already killed all sub-threads either ->in_exec == 0 or we are the
only user of this ->fs.
Also, we do not need fs->lock to clear fs->in_exec.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently we look it up from ->ioprio, but ->ioprio can change if
either the process gets its IO priority changed explicitly, or if
cfq decides to temporarily boost it. So if we are unlucky, we can
end up attempting to remove a node from a different rbtree root than
where it was added.
Fix this by using ->org_ioprio as the prio_tree index, since that
will only change for explicit IO priority settings (not for a boost).
Additionally cache the rbtree root inside the cfqq, then we don't have
to add code to reinsert the cfqq in the prio_tree if IO priority changes.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
cfq_prio_tree_lookup() should return the direct match, yet it always
returns zero. Fix that.
cfq_prio_tree_add() assumes that we don't get a direct match, while
it is very possible that we do. Using O_DIRECT, you can have different
cfqq with matching requests, since you don't have the page cache
to serialize things for you. Fix this bug by only adding the cfqq if
there isn't an existing match.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Not strictly needed, but we should make it clear that we init the
rbtree roots here.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Very rarely under stress testing of dm, oopses are occuring as
something tampers with an old stack frame. This has been traced back
to blk_abort_queue() leaving a timeout_list pointing to the stack.
The reason is that sometimes blk_abort_request() won't delete the
timer (if the request is marked as complete but before the timer has
been removed, a small race window). Fix this by splicing back from
the ususally empty list to the q->timeout_list.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
The umem driver issues two warnings on boot, due to blk_plug_device() and
blk_remove_plug() being called without q->queue_lock held. Starting with
e48ec690 (block: extend queue_flag bitops), the queue_flag_* functions
warn if q->queue_lock doesn't appear to be locked. In fact, q->queue_lock
is NULL (though that apparently isn't otherwise a problem as the driver is
using card->lock for everything).
Although blk_init_queue() with take a request_fn_proc and spinlock_t*,
there isn't a corresponding init helper that takes a make_request_fn.
Setting queue_lock to &card->lock explicitly seems to work fine for me.
The warning goes away and the device appears to behave.
[ 1.531881] v2.3 : Micro Memory(tm) PCI memory board block driver
[ 1.538136] umem 0000:02:01.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 1.545018] umem 0000:02:01.0: Micro Memory(tm) controller found (PCI Mem Module (Battery Backup))
[ 1.554176] umem 0000:02:01.0: CSR 0xfc9ffc00 -> 0xffffc200013d0c00 (0x100)
[ 1.561279] umem 0000:02:01.0: Size 1048576 KB, Battery 1 Disabled (FAILURE), Battery 2 Disabled (FAILURE)
[ 1.571114] umem 0000:02:01.0: Window size 16777216 bytes, IRQ 20
[ 1.577304] umem 0000:02:01.0: memory NOT initialized. Consider over-writing whole device.
[ 1.585989] umema:<4>------------[ cut here ]------------
[ 1.591775] WARNING: at include/linux/blkdev.h:492 blk_plug_device+0x6d/0x106()
[ 1.592025] Hardware name: H8SSL
[ 1.592025] Modules linked in:
[ 1.592025] Pid: 1, comm: swapper Not tainted 2.6.29 #8
[ 1.592025] Call Trace:
[ 1.592025] [<ffffffff8023c994>] warn_slowpath+0xd3/0xf2
[ 1.592025] [<ffffffff8025a5b5>] ? save_trace+0x3f/0x9b
[ 1.592025] [<ffffffff8025a68b>] ? add_lock_to_list+0x7a/0xba
[ 1.592025] [<ffffffff8025e609>] ? validate_chain+0xb3b/0xce8
[ 1.592025] [<ffffffff80441556>] ? mm_make_request+0x27/0x59
[ 1.592025] [<ffffffff80441556>] ? mm_make_request+0x27/0x59
[ 1.592025] [<ffffffff8025ef04>] ? __lock_acquire+0x74e/0x7b9
[ 1.592025] [<ffffffff8025a70e>] ? get_lock_stats+0x34/0x5e
[ 1.592025] [<ffffffff8025a746>] ? put_lock_stats+0xe/0x27
[ 1.592025] [<ffffffff80441556>] ? mm_make_request+0x27/0x59
[ 1.592025] [<ffffffff803ad165>] blk_plug_device+0x6d/0x106
[ 1.592025] [<ffffffff80441575>] mm_make_request+0x46/0x59
[ 1.592025] [<ffffffff803ac2d9>] generic_make_request+0x335/0x3cf
[ 1.592025] [<ffffffff8027fcc7>] ? mempool_alloc_slab+0x11/0x13
[ 1.592025] [<ffffffff8027fdce>] ? mempool_alloc+0x45/0x101
[ 1.592025] [<ffffffff8025a746>] ? put_lock_stats+0xe/0x27
[ 1.592025] [<ffffffff803adda5>] submit_bio+0x10a/0x119
[ 1.592025] [<ffffffff802c8d00>] submit_bh+0xe5/0x109
[ 1.592025] [<ffffffff802cbf43>] block_read_full_page+0x2aa/0x2cb
[ 1.592025] [<ffffffff802cf4c4>] ? blkdev_get_block+0x0/0x4c
[ 1.592025] [<ffffffff805c90a8>] ? _spin_unlock_irq+0x36/0x51
[ 1.592025] [<ffffffff80286836>] ? __lru_cache_add+0x92/0xb2
[ 1.592025] [<ffffffff802cf008>] blkdev_readpage+0x13/0x15
[ 1.592025] [<ffffffff8027de06>] read_cache_page_async+0x90/0x134
[ 1.592025] [<ffffffff802ceff5>] ? blkdev_readpage+0x0/0x15
[ 1.592025] [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
[ 1.592025] [<ffffffff8027deb8>] read_cache_page+0xe/0x45
[ 1.592025] [<ffffffff802f5170>] read_dev_sector+0x2e/0x93
[ 1.592025] [<ffffffff802f5f44>] adfspart_check_ICS+0x28/0x16c
[ 1.592025] [<ffffffff8025d427>] ? trace_hardirqs_on+0xd/0xf
[ 1.592025] [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
[ 1.592025] [<ffffffff802f59c5>] rescan_partitions+0x168/0x2fb
[ 1.592025] [<ffffffff802ceae9>] __blkdev_get+0x259/0x336
[ 1.592025] [<ffffffff803ca1e2>] ? kobject_put+0x47/0x4b
[ 1.592025] [<ffffffff802cebd1>] blkdev_get+0xb/0xd
[ 1.592025] [<ffffffff802f5773>] register_disk+0xc4/0x12b
[ 1.592025] [<ffffffff803b2a7b>] add_disk+0xc3/0x12d
[ 1.592025] [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
[ 1.592025] [<ffffffff808a1e73>] mm_init+0x129/0x1a5
[ 1.592025] [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
[ 1.592025] [<ffffffff80209056>] _stext+0x56/0x130
[ 1.592025] [<ffffffff80274932>] ? register_irq_proc+0xae/0xca
[ 1.592025] [<ffffffff802f0000>] ? proc_pid_lookup+0xb4/0x18b
[ 1.592025] [<ffffffff8087f975>] kernel_init+0x132/0x18b
[ 1.592025] [<ffffffff8020d17a>] child_rip+0xa/0x20
[ 1.592025] [<ffffffff8020cb40>] ? restore_args+0x0/0x30
[ 1.592025] [<ffffffff8087f843>] ? kernel_init+0x0/0x18b
[ 1.592025] [<ffffffff8020d170>] ? child_rip+0x0/0x20
[ 1.592025] ---[ end trace 7150b3b86da74e1e ]---
[ 1.889858] ------------[ cut here ]------------[ve_plug+0x5f/0x91()
[ 1.893848] Hardware name: H8SSL
[ 1.893848] Modules linked in:
[ 1.893848] Pid: 1, comm: swapper Tainted: G W 2.6.29 #8
[ 1.893848] Call Trace:
[ 1.893848] [<ffffffff8023c994>] warn_slowpath+0xd3/0xf2
[ 1.893848] [<ffffffff805c8411>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1.893848] [<ffffffff8020cb40>] ? restore_args+0x0/0x30
[ 1.893848] [<ffffffff80254245>] ? __atomic_notifier_call_chain+0x0/0xb2
[ 1.893848] [<ffffffff805c90a3>] ? _spin_unlock_irq+0x31/0x51
[ 1.893848] [<ffffffff805c90bf>] ? _spin_unlock_irq+0x4d/0x51
[ 1.893848] [<ffffffff8044157d>] ? mm_make_request+0x4e/0x59
[ 1.893848] [<ffffffff8025a70e>] ? get_lock_stats+0x34/0x5e
[ 1.893848] [<ffffffff8025a75d>] ? put_lock_stats+0x25/0x27
[ 1.893848] [<ffffffff80441504>] ? mm_unplug_device+0x25/0x50
[ 1.893848] [<ffffffff803acf23>] blk_remove_plug+0x5f/0x91
[ 1.893848] [<ffffffff8044150f>] mm_unplug_device+0x30/0x50
[ 1.893848] [<ffffffff803ab74a>] blk_unplug+0x78/0x7d
[ 1.893848] [<ffffffff803ab75c>] blk_backing_dev_unplug+0xd/0xf
[ 1.893848] [<ffffffff802c853c>] block_sync_page+0x4a/0x4c
[ 1.893848] [<ffffffff8027da1c>] sync_page+0x44/0x4d
[ 1.893848] [<ffffffff805c66fd>] __wait_on_bit_lock+0x42/0x8a
[ 1.893848] [<ffffffff8027d9d8>] ? sync_page+0x0/0x4d
[ 1.893848] [<ffffffff8027d9c4>] __lock_page+0x64/0x6b
[ 1.893848] [<ffffffff802508db>] ? wake_bit_function+0x0/0x2a
[ 1.893848] [<ffffffff8027de4a>] read_cache_page_async+0xd4/0x134
[ 1.893848] [<ffffffff802ceff5>] ? blkdev_readpage+0x0/0x15
[ 1.893848] [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
[ 1.893848] [<ffffffff8027deb8>] read_cache_page+0xe/0x45
[ 1.893848] [<ffffffff802f5170>] read_dev_sector+0x2e/0x93
[ 1.893848] [<ffffffff802f5f44>] adfspart_check_ICS+0x28/0x16c
[ 1.893848] [<ffffffff8025d427>] ? trace_hardirqs_on+0xd/0xf
[ 1.893848] [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
[ 1.893848] [<ffffffff802f59c5>] rescan_partitions+0x168/0x2fb
[ 1.893848] [<ffffffff802ceae9>] __blkdev_get+0x259/0x336
[ 1.893848] [<ffffffff803ca1e2>] ? kobject_put+0x47/0x4b
[ 1.893848] [<ffffffff802cebd1>] blkdev_get+0xb/0xd
[ 1.893848] [<ffffffff802f5773>] register_disk+0xc4/0x12b
[ 1.893848] [<ffffffff803b2a7b>] add_disk+0xc3/0x12d
[ 1.893848] [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
[ 1.893848] [<ffffffff808a1e73>] mm_init+0x129/0x1a5
[ 1.893848] [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
[ 1.893848] [<ffffffff80209056>] _stext+0x56/0x130
[ 1.893848] [<ffffffff80274932>] ? register_irq_proc+0xae/0xca
[ 1.893848] [<ffffffff802f0000>] ? proc_pid_lookup+0xb4/0x18b
[ 1.893848] [<ffffffff8087f975>] kernel_init+0x132/0x18b
[ 1.893848] [<ffffffff8020d17a>] child_rip+0xa/0x20
[ 1.893848] [<ffffffff8020cb40>] ? restore_args+0x0/0x30
[ 1.893848] [<ffffffff8087f843>] ? kernel_init+0x0/0x18b
[ 1.893848] [<ffffffff8020d170>] ? child_rip+0x0/0x20
[ 1.893848] ---[ end trace 7150b3b86da74e1f ]---
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
This simplifies I/O stat accounting switching code and separates it
completely from I/O scheduler switch code.
Requests are accounted according to the state of their request queue
at the time of the request allocation. There is no need anymore to
flush the request queue when switching I/O accounting state.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|