| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Combine SG entries describing single contiguous memory block into one
Tsi721 BDMA descriptor. This reduces number of hardware descriptors
required for large data transfers and improves performance on the PCIe
side by reducing number of descriptor fetch requests.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zram is ram based block device and can be used by backend of filesystem.
When filesystem deletes a file, it normally doesn't do anything on data
block of that file. It just marks on metadata of that file. This
behavior has no problem on disk based block device, but has problems on
ram based block device, since we can't free memory used for data block.
To overcome this disadvantage, there is REQ_DISCARD functionality. If
block device support REQ_DISCARD and filesystem is mounted with discard
option, filesystem sends REQ_DISCARD to block device whenever some data
blocks are discarded. All we have to do is to handle this request.
This patch implements to flag up QUEUE_FLAG_DISCARD and handle this
REQ_DISCARD request. With it, we can free memory used by zram if it isn't
used.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sysfs.txt documentation lists the following requirements:
- The buffer will always be PAGE_SIZE bytes in length. On i386, this
is 4096.
- show() methods should return the number of bytes printed into the
buffer. This is the return value of scnprintf().
- show() should always use scnprintf().
Use scnprintf() in show() functions.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we initialized zcomp with single, we couldn't change
max_comp_streams without zram reset but current interface doesn't show
any error to user and even it changes max_comp_streams's value without
any effect so it would make user very confusing.
This patch prevents max_comp_streams's change when zcomp was initialized
as single zcomp and emit the error to user(ex, echo).
[akpm@linux-foundation.org: don't return with the lock held, per Sergey]
[fengguang.wu@intel.com: fix coccinelle warnings]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of returning just NULL, return ERR_PTR from zcomp_create() if
compressing backend creation has failed. ERR_PTR(-EINVAL) for unsupported
compression algorithm request, ERR_PTR(-ENOMEM) for allocation (zcomp or
compression stream) error.
Perform IS_ERR() check of returned from zcomp_create() value in
disksize_store() and set return code to PTR_ERR().
Change suggested by Jerome Marchand.
[akpm@linux-foundation.org: clean up error recovery flow]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While fixing lockdep spew of ->init_lock reported by Sasha Levin [1],
Minchan Kim noted [2] that it's better to move compression backend
allocation (using GPF_KERNEL) out of the ->init_lock lock, same way as
with zram_meta_alloc(), in order to prevent the same lockdep spew.
[1] https://lkml.org/lkml/2014/2/27/337
[2] https://lkml.org/lkml/2014/3/3/32
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce LZ4 compression backend and make it available for selection.
LZ4 support is optional and requires user to set ZRAM_LZ4_COMPRESS config
option. The default compression backend is LZO.
TEST
(x86_64, core i5, 2 cores + 2 hyperthreading, zram disk size 1G,
ext4 file system, 3 compression streams)
iozone -t 3 -R -r 16K -s 60M -I +Z
Test LZO LZ4
----------------------------------------------
Initial write 1642744.62 1317005.09
Rewrite 2498980.88 1800645.16
Read 3957026.38 5877043.75
Re-read 3950997.38 5861847.00
Reverse Read 2937114.56 5047384.00
Stride read 2948163.19 4929587.38
Random read 3292692.69 4880793.62
Mixed workload 1545602.62 3502940.38
Random write 2448039.75 1758786.25
Pwrite 1670051.03 1338329.69
Pread 2530682.00 5097177.62
Fwrite 3232085.62 3275942.56
Fread 6306880.25 6645271.12
So on my system LZ4 is slower in write-only tests, while it performs
better in read-only and mixed (reads + writes) tests.
Official LZ4 benchmarks available here http://code.google.com/p/lz4/
(linux kernel uses revision r90).
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add and document `comp_algorithm' device attribute. This attribute allows
to show supported compression and currently selected compression
algorithms:
cat /sys/block/zram0/comp_algorithm
[lzo] lz4
and change selected compression algorithm:
echo lzo > /sys/block/zram0/comp_algorithm
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch allows to change max_comp_streams on initialised zcomp.
Introduce zcomp set_max_streams() knob, zcomp_strm_multi_set_max_streams()
and zcomp_strm_single_set_max_streams() callbacks to change streams limit
for zcomp_strm_multi and zcomp_strm_single, accordingly. set_max_streams
for single steam zcomp does nothing.
If user has lowered the limit, then zcomp_strm_multi_set_max_streams()
attempts to immediately free extra streams (as much as it can, depending
on idle streams availability).
Note, this patch does not allow to change stream 'policy' from single to
multi stream (or vice versa) on already initialised compression backend.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Existing zram (zcomp) implementation has only one compression stream
(buffer and algorithm private part), so in order to prevent data
corruption only one write (compress operation) can use this compression
stream, forcing all concurrent write operations to wait for stream lock
to be released. This patch changes zcomp to keep a compression streams
list of user-defined size (via sysfs device attr). Each write operation
still exclusively holds compression stream, the difference is that we
can have N write operations (depending on size of streams list)
executing in parallel. See TEST section later in commit message for
performance data.
Introduce struct zcomp_strm_multi and a set of functions to manage
zcomp_strm stream access. zcomp_strm_multi has a list of idle
zcomp_strm structs, spinlock to protect idle list and wait queue, making
it possible to perform parallel compressions.
The following set of functions added:
- zcomp_strm_multi_find()/zcomp_strm_multi_release()
find and release a compression stream, implement required locking
- zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
create and destroy zcomp_strm_multi
zcomp ->strm_find() and ->strm_release() callbacks are set during
initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
correspondingly.
Each time zcomp issues a zcomp_strm_multi_find() call, the following set
of operations performed:
- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
awaken by zcomp_strm_multi_release() caller.
zcomp_strm_multi_release():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper
Minchan Kim reported that spinlock-based locking scheme has demonstrated
a severe perfomance regression for single compression stream case,
comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)
base spinlock mutex
==Initial write ==Initial write ==Initial write
records: 5 records: 5 records: 5
avg: 1642424.35 avg: 699610.40 avg: 1655583.71
std: 39890.95(2.43%) std: 232014.19(33.16%) std: 52293.96
max: 1690170.94 max: 1163473.45 max: 1697164.75
min: 1568669.52 min: 573429.88 min: 1553410.23
==Rewrite ==Rewrite ==Rewrite
records: 5 records: 5 records: 5
avg: 1611775.39 avg: 501406.64 avg: 1684419.11
std: 17144.58(1.06%) std: 15354.41(3.06%) std: 18367.42
max: 1641800.95 max: 531356.78 max: 1706445.84
min: 1593515.27 min: 488817.78 min: 1655335.73
When only one compression stream available, mutex with spin on owner
tends to perform much better than frequent wait_event()/wake_up(). This
is why single stream implemented as a special case with mutex locking.
Introduce and document zram device attribute max_comp_streams. This
attr shows and stores current zcomp's max number of zcomp streams
(max_strm). Extend zcomp's zcomp_create() with `max_strm' parameter.
`max_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams).
max_comp_streams used during initialisation as follows:
-- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
using single compression stream zcomp_strm_single (mutex-based locking).
-- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
using multi compression stream zcomp_strm_multi (spinlock-based locking).
default max_comp_streams value is 1, meaning that zram with single stream
will be initialised.
Later patch will introduce configuration knob to change max_comp_streams
on already initialised and used zcomp.
TEST
iozone -t 3 -R -r 16K -s 60M -I +Z
test base 1 strm (mutex) 3 strm (spinlock)
-----------------------------------------------------------------------
Initial write 589286.78 583518.39 718011.05
Rewrite 604837.97 596776.38 1515125.72
Random write 584120.11 595714.58 1388850.25
Pwrite 535731.17 541117.38 739295.27
Fwrite 1418083.88 1478612.72 1484927.06
Usage example:
set max_comp_streams to 4
echo 4 > /sys/block/zram0/max_comp_streams
show current max_comp_streams (default value is 1).
cat /sys/block/zram0/max_comp_streams
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is preparation patch to add multi stream support to zcomp.
Introduce struct zcomp_strm_single and a set of functions to manage
zcomp_strm stream access. zcomp_strm_single implements single compession
stream, same way as current zcomp implementation. This moves zcomp_strm
stream control and locking from zcomp, so compressing backend zcomp is not
aware of required locking.
Single and multi streams require different locking schemes. Minchan Kim
reported that spinlock-based locking scheme (which is used in multi stream
implementation) has demonstrated a severe perfomance regression for single
compression stream case, comparing to mutex-based. see
https://lkml.org/lkml/2014/2/18/16
The following set of functions added:
- zcomp_strm_single_find()/zcomp_strm_single_release()
find and release a compression stream, implement required locking
- zcomp_strm_single_create()/zcomp_strm_single_destroy()
create and destroy zcomp_strm_single
New ->strm_find() and ->strm_release() callbacks added to zcomp, which are
set to zcomp_strm_single_find() and zcomp_strm_single_release() during
initialisation. Instead of direct locking and zcomp_strm access from
zcomp_strm_find() and zcomp_strm_release(), zcomp now calls ->strm_find()
and ->strm_release() correspondingly.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not perform direct LZO compress/decompress calls, initialise
and use zcomp LZO backend (single compression stream) instead.
[akpm@linux-foundation.org: resolve conflicts with zram-delete-zram_init_device-fix.patch]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZRAM performs direct LZO compression algorithm calls, making it the one
and only option. While LZO is generally performs well, LZ4 algorithm
tends to have a faster decompression (see http://code.google.com/p/lz4/
for full report)
Name Ratio C.speed D.speed
MB/s MB/s
LZ4 (r101) 2.084 422 1820
LZO 2.06 2.106 414 600
Thus, users who have mostly read (decompress) usage scenarious or mixed
workflow (writes with relatively high read ops number) will benefit from
using LZ4 compression backend.
Introduce compressing backend abstraction zcomp in order to support
multiple compression algorithms with the following set of operations:
.create
.destroy
.compress
.decompress
Schematically zram write() usually contains the following steps:
0) preparation (decompression of partioal IO, etc.)
1) lock buffer_lock mutex (protects meta compress buffers)
2) compress (using meta compress buffers)
3) alloc and map zs_pool object
4) copy compressed data (from meta compress buffers) to object allocated by 3)
5) free previous pool page, assign a new one
6) unlock buffer_lock mutex
As we can see, compressing buffers must remain untouched from 1) to 4),
because, otherwise, concurrent write() can overwrite data. At the same
time, zram_meta must be aware of a) specific compression algorithm memory
requirements and b) necessary locking to protect compression buffers. To
remove requirement a) new struct zcomp_strm introduced, which contains a
compress/decompress `buffer' and compression algorithm `private' part.
While struct zcomp implements zcomp_strm stream handling and locking and
removes requirement b) from zram meta. zcomp ->create() and ->destroy(),
respectively, allocate and deallocate algorithm specific zcomp_strm
`private' part.
Every zcomp has zcomp stream and mutex to protect its compression stream.
Stream usage semantics remains the same -- only one write can hold stream
lock and use its buffers. zcomp_strm_find() turns caller into exclusive
user of a stream (holding stream mutex until zram release stream), and
zcomp_strm_release() makes zcomp stream available (unlock the stream
mutex). Hence no concurrent write (compression) operations possible at
the moment.
iozone -t 3 -R -r 16K -s 60M -I +Z
test base patched
--------------------------------------------------
Initial write 597992.91 591660.58
Rewrite 609674.34 616054.97
Read 2404771.75 2452909.12
Re-read 2459216.81 2470074.44
Reverse Read 1652769.66 1589128.66
Stride read 2202441.81 2202173.31
Random read 2236311.47 2276565.31
Mixed workload 1423760.41 1709760.06
Random write 579584.08 615933.86
Pwrite 597550.02 594933.70
Pread 1703672.53 1718126.72
Fwrite 1330497.06 1461054.00
Fread 3922851.00 3957242.62
Usage examples:
comp = zcomp_create(NAME) /* NAME e.g. "lzo" */
which initialises compressing backend if requested algorithm is supported.
Compress:
zstrm = zcomp_strm_find(comp)
zcomp_compress(comp, zstrm, src, &dst_len)
[..] /* copy compressed data */
zcomp_strm_release(comp, zstrm)
Decompress:
zcomp_decompress(comp, src, src_len, dst);
Free compessing backend and its zcomp stream:
zcomp_destroy(comp)
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
allocate new `zram_meta' in disksize_store() only for uninitialised zram
device, saving a number of allocations and deallocations in case if
disksize_store() was called on currently used device. at the same time
zram_meta stack variable is not necessary, because we can set ->meta
directly. there is also no need in setting QUEUE_FLAG_NONROT queue on
every disksize_store(), set it once during device creation.
[minchan@kernel.org: handle zram->meta alloc fail case]
[minchan@kernel.org: prevent lockdep spew of init_lock]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move zram warning about disksize and size of memory correlation to zram
documentation.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
| |
struct table `count' member is not used.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zram accounted but did not report numbers of failed read and write
queries. make these stats available as failed_reads and failed_writes
attrs.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce ZRAM_ATTR_RO macro that generates device_attribute and default
ATTR show() function for existing atomic64_t zram stats.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a preparation patch for stats code duplication removal.
1) use atomic64_t for `pages_zero' and `pages_stored' zram stats.
2) `compr_size' and `pages_zero' struct zram_stats members did not
follow the existing device attr naming scheme: zram_stats.ATTR has
ATTR_show() function. rename them:
-- compr_size -> compr_data_size
-- pages_zero -> zero_pages
Minchan Kim's note:
If we really have trouble with atomic stat operation, we could
change it with percpu_counter so that it could solve atomic overhead and
unnecessary memory space by introducing unsigned long instead of 64bit
atomic_t.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove `good' and `bad' compressed sub-requests stats. RW request may
cause a number of RW sub-requests. zram used to account `good' compressed
sub-queries (with compressed size less than 50% of original size), `bad'
compressed sub-queries (with compressed size greater that 75% of original
size), leaving sub-requests with compression size between 50% and 75% of
original size not accounted and not reported. zram already accounts each
sub-request's compression size so we can calculate real device compression
ratio.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not pass rw argument down the __zram_make_request() -> zram_bvec_rw()
chain, decode it in zram_bvec_rw() instead. Besides, this is the place
where we distinguish READ and WRITE bio data directions, so account zram
RW stats here, instead of __zram_make_request(). This also allows to
account a real number of zram READ/WRITE operations, not just requests
(single RW request may cause a number of zram RW ops with separate
locking, compression/decompression, etc).
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce init_done() helper function which allows us to drop `init_done'
struct zram member. init_done() uses the fact that ->init_done == 1
equals to ->meta != NULL.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
| |
"mm: introduce vm_ops->map_pages()" wants to export a do_set_pte() from core
kernel. Rename lguest's do_set_pte() to something more lguest-specific.
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Nothing major: the stricter permissions checking for sysfs broke a
staging driver; fix included. Greg KH said he'd take the patch but
hadn't as the merge window opened, so it's included here to avoid
breaking build"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
staging: fix up speakup kobject mode
Use 'E' instead of 'X' for unsigned module taint flag.
VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
kallsyms: fix percpu vars on x86-64 with relocation.
kallsyms: generalize address range checking
module: LLVMLinux: Remove unused function warning from __param_check macro
Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
module: remove MODULE_GENERIC_TABLE
module: allow multiple calls to MODULE_DEVICE_TABLE() per module
module: use pr_cont
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It uses the unnecessary S_IFREG bit which broke when my
stricter-checking-for-mode patch went in.
Since we're fixing it anyway, the extra level of indirection is
confusing for readers (ROOT_W == rw-r--r-- for example).
Also, many of these are other-writable. Is that really intended?
I'll-queue-this-patch-up-in-a-bit-by: Greg KH <greg@kroah.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Summary of http://lkml.org/lkml/2014/3/14/363 :
Ted: module_param(queue_depth, int, 444)
Joe: 0444!
Rusty: User perms >= group perms >= other perms?
Joe: CLASS_ATTR, DEVICE_ATTR, SENSOR_ATTR and SENSOR_ATTR_2?
Side effect of stricter permissions means removing the unnecessary
S_IFREG from several callers.
Note that the BUILD_BUG_ON_ZERO((perm) & 2) test was removed: a fair
number of drivers fail this test, so that will be the debate for a
future patch.
Suggested-by: Joe Perches <joe@perches.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> for drivers/pci/slot.c
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper changes from Mike Snitzer:
- Fix dm-cache corruption caused by discard_block_size > cache_block_size
- Fix a lock-inversion detected by LOCKDEP in dm-cache
- Fix a dangling bio bug in the dm-thinp target's process_deferred_bios
error path
- Fix corruption due to non-atomic transaction commit which allowed a
metadata superblock to be written before all other metadata was
successfully written -- this is common to all targets that use the
persistent-data library's transaction manager (dm-thinp, dm-cache and
dm-era).
- Various small cleanups in the DM core
- Add the dm-era target which is useful for keeping track of which
blocks were written within a user defined period of time called an
'era'. Use cases include tracking changed blocks for backup
software, and partially invalidating the contents of a cache to
restore cache coherency after rolling back a vendor snapshot.
- Improve the on-disk layout of multithreaded writes to the
dm-thin-pool by splitting the pool's deferred bio list to be a
per-thin device list and then sorting that list using an rb_tree.
The subsequent read throughput of the data written via multiple
threads improved by ~70%.
- Simplify the multipath target's handling of queuing IO by pushing
requests back to the request queue rather than queueing the IO
internally.
* tag 'dm-3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (24 commits)
dm cache: fix a lock-inversion
dm thin: sort the per thin deferred bios using an rb_tree
dm thin: use per thin device deferred bio lists
dm thin: simplify pool_is_congested
dm thin: fix dangling bio in process_deferred_bios error path
dm mpath: print more useful warnings in multipath_message()
dm-mpath: do not activate failed paths
dm mpath: remove extra nesting in map function
dm mpath: remove map_io()
dm mpath: reduce memory pressure when requeuing
dm mpath: remove process_queued_ios()
dm mpath: push back requests instead of queueing
dm table: add dm_table_run_md_queue_async
dm mpath: do not call pg_init when it is already running
dm: use RCU_INIT_POINTER instead of rcu_assign_pointer in __unbind
dm: stop using bi_private
dm: remove dm_get_mapinfo
dm: make dm_table_alloc_md_mempools static
dm: take care to copy the space map roots before locking the superblock
dm transaction manager: fix corruption due to non-atomic transaction commit
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When suspending a cache the policy is walked and the individual policy
hints written to the metadata via sync_metadata(). This led to this
lock order:
policy->lock
cache_metadata->root_lock
When loading the cache target the policy is populated while the metadata
lock is held:
cache_metadata->root_lock
policy->lock
Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
ensuring the cache_metadata root_lock is held whilst all the hints are
written, rather than being repeatedly locked while policy->lock is held
(as was the case with each callout that policy_walk_mappings() made to
the old save_hint() method).
Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
locking correctness") build option. However, it is not clear how the
LOCKDEP reported paths can lead to a deadlock since the two paths,
suspending a target and loading a target, never occur at the same time.
But that doesn't mean the same lock-inversion couldn't have occurred
elsewhere.
Reported-by: Marian Csontos <mcsontos@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A thin-pool will allocate blocks using FIFO order for all thin devices
which share the thin-pool. Because of this simplistic allocation the
thin-pool's space can become fragmented quite easily; especially when
multiple threads are requesting blocks in parallel.
Sort each thin device's deferred_bio_list based on logical sector to
help reduce fragmentation of the thin-pool's ondisk layout.
The following tables illustrate the realized gains/potential offered by
sorting each thin device's deferred_bio_list. An "io size"-sized random
read of the device would result in "seeks/io" fragments being read, with
an average "distance/seek" between each fragment.
Data was written to a single thin device using multiple threads via
iozone (8 threads, 64K for both the block_size and io_size).
unsorted:
io size seeks/io distance/seek
--------------------------------------
4k 0.000 0b
16k 0.013 11m
64k 0.065 11m
256k 0.274 10m
1m 1.109 10m
4m 4.411 10m
16m 17.097 11m
64m 60.055 13m
256m 148.798 25m
1g 809.929 21m
sorted:
io size seeks/io distance/seek
--------------------------------------
4k 0.000 0b
16k 0.000 1g
64k 0.001 1g
256k 0.003 1g
1m 0.011 1g
4m 0.045 1g
16m 0.181 1g
64m 0.747 1011m
256m 3.299 1g
1g 14.373 1g
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The thin-pool previously only had a single deferred_bios list that would
collect bios for all thin devices in the pool. Split this per-pool
deferred_bios list out to per-thin deferred_bios_list -- doing so
enables increased parallelism when processing deferred bios. And now
that each thin device has it's own deferred_bios_list we can sort all
bios in the list using logical sector. The requeue code in error
handling path is also cleaner as a side-effect.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The pool is congested if the pool is in PM_OUT_OF_DATA_SPACE mode. This
is more explicit/clear/efficient than inferring whether or not the pool
is congested by checking if retry_on_resume_list is empty.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If unable to ensure_next_mapping() we must add the current bio, which
was removed from the @bios list via bio_list_pop, back to the
deferred_bios list before all the remaining @bios.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The warning message "Unrecognised multipath message received" is
displayed in two different situations in multipath_message(): when the
number of arguments passed is invalid and when the string passed in
argv[0] is not recognized.
Make it easier to identify where the problem is by making these warnings
more specific with additional context for each case.
Signed-off-by: Jose Castillo <jcastillo@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
activate_path() is run without a lock, so the path might be
set to failed before activate_path() had a chance to run.
This patch add a check for ->active in activate_path() to
avoid unnecessary overhead by calling functions which are known
to be failing.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Return early for case when no path exists, and when the
pathgroup isn't ready. This eliminates the need for
extra nesting for the the common case.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
multipath_map() is now just a wrapper around map_io(), so we
can rename map_io() to multipath_map().
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When multipath needs to requeue I/O in the block layer the per-request
context shouldn't be allocated, as it will be freed immediately
afterwards anyway. Avoiding this memory allocation will reduce memory
pressure during requeuing.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
process_queued_ios() has served 3 functions:
1) select pg and pgpath if none is selected
2) start pg_init if requested
3) dispatch queued IOs when pg is ready
Basically, a call to queue_work(process_queued_ios) can be replaced by
dm_table_run_md_queue_async(), which runs request queue and ends up
calling map_io(), which does 1), 2) and 3).
Exception is when !pg_ready() (which means either pg_init is running or
requested), then multipath_busy() prevents map_io() being called from
request_fn.
If pg_init is running, it should be ok as long as pg_init_done() does
the right thing when pg_init is completed, I.e.: restart pg_init if
!pg_ready() or call dm_table_run_md_queue_async() to kick map_io().
If pg_init is requested, we have to make sure the request is detected
and pg_init will be started. pg_init is requested in 3 places:
a) __choose_pgpath() in map_io()
b) __choose_pgpath() in multipath_ioctl()
c) pg_init retry in pg_init_done()
a) is ok because map_io() calls __pg_init_all_paths(), which does 2).
b) needs a call to __pg_init_all_paths(), which does 2).
c) needs a call to __pg_init_all_paths(), which does 2).
So this patch removes process_queued_ios() and ensures that
__pg_init_all_paths() is called at the appropriate locations.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There is no reason why multipath needs to queue requests internally for
queue_if_no_path or pg_init; we should rather push them back onto the
request queue.
And while we're at it we can simplify the conditional statement in
map_io() to make it easier to read.
Since mpath no longer does internal queuing of I/O the table info no
longer emits the internal queue_size. Instead it displays 1 if queuing
is being used or 0 if it is not.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Introduce dm_table_run_md_queue_async() to run the request_queue of the
mapped_device associated with a request-based DM table.
Also add dm_md_get_queue() wrapper to extract the request_queue from a
mapped_device.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch moves condition checks as a preparation of following
patches and has no effect on behaviour.
process_queued_ios() is the only caller of __pg_init_all_paths()
and 2 condition checks are moved from outside to inside without
side effects.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Replace rcu_assign_pointer(p, NULL) with RCU_INIT_POINTER(p, NULL).
The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure. And in the
case of the NULL pointer, there is no structure to initialize. So,
rcu_assign_pointer(p, NULL) can be safely converted to
RCU_INIT_POINTER(p, NULL).
Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Device mapper uses the bio structure's bi_private field as a pointer
to dm_target_io or dm_rq_clone_bio_info. But a bio structure is
embedded in the dm_target_io and dm_rq_clone_bio_info structures, so the
pointer to the structure that contains the bio can be found with the
container_of() macro.
Remove the use of bi_private and use container_of() instead.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Remove dm_get_mapinfo() because no target uses it. Targets can allocate
per-bio data using ti->per_bio_data_size, this is much more flexible
than union map_info.
Leave union map_info only for the request-based multipath target's use.
Also delete the unused "unsigned long long ll" field of union map_info.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make the function dm_table_alloc_md_mempools static because it is not
called from another file.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In theory copying the space map root can fail, but in practice it never
does because we're careful to check what size buffer is needed.
But make certain we're able to copy the space map roots before
locking the superblock.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # drop dm-era and dm-cache changes as needed
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The persistent-data library used by dm-thin, dm-cache, etc is
transactional. If anything goes wrong, such as an io error when writing
new metadata or a power failure, then we roll back to the last
transaction.
Atomicity when committing a transaction is achieved by:
a) Never overwriting data from the previous transaction.
b) Writing the superblock last, after all other metadata has hit the
disk.
This commit and the following commit ("dm: take care to copy the space
map roots before locking the superblock") fix a bug associated with (b).
When committing it was possible for the superblock to still be written
in spite of an io error occurring during the preceeding metadata flush.
With these commits we're careful not to take the write lock out on the
superblock until after the metadata flush has completed.
Change the transaction manager's semantics for dm_tm_commit() to assume
all data has been flushed _before_ the single superblock that is passed
in.
As a prerequisite, split the block manager's block unlocking and
flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush(). Now
the unlocking must be done separately.
This issue was discovered by forcing io errors at the crucial time
using dm-flakey.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Discard block size not being equal to cache block size causes data
corruption by erroneously avoiding migrations in issue_copy() because
the discard state is being cleared for a group of cache blocks when it
should not.
Completely remove all code that enabled a distinction between the
cache block size and discard block size.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the discard block size is larger than the cache block size we will
not properly quiesce IO to a region that is about to be discarded. This
results in a race between a cache migration where no copy is needed, and
a write to an adjacent cache block that's within the same large discard
block.
Workaround this by limiting the discard_block_size to cache_block_size.
Also limit the max_discard_sectors to cache_block_size.
A more comprehensive fix that introduces range locking support in the
bio_prison and proper quiescing of a discard range that spans multiple
cache blocks is already in development.
Reported-by: Morgan Mears <Morgan.Mears@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Acked-by: Heinz Mauelshagen <heinzm@redhat.com>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This change offers a big performance boost for dm-era.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|