| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the arm64 CRC-T10DIF assembly code into the lib directory and wire
it up to the library interface. This allows it to be used without going
through the crypto API. It remains usable via the crypto API too via
the shash algorithms that use the library interface. Thus all the
arch-specific "shash" code becomes unnecessary and is removed.
Note: to see the diff from arch/arm64/crypto/crct10dif-ce-glue.c to
arch/arm64/lib/crc-t10dif-glue.c, view this commit with 'git show -M10'.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20241202012056.209768-7-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the CRC32 library export a function crc32_optimizations() which
returns flags that indicate which CRC32 functions are actually executing
optimized code at runtime.
This will be used to determine whether the crc32[c]-$arch shash
algorithms should be registered in the crypto API. btrfs could also
start using these flags instead of the hack that it currently uses where
it parses the crypto_shash_driver_name.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the CRC32 library functions are defined as weak symbols, and
the arm64 and riscv architectures override them.
This method of arch-specific overrides has the limitation that it only
works when both the base and arch code is built-in. Also, it makes the
arch-specific code be silently not used if it is accidentally built with
lib-y instead of obj-y; unfortunately the RISC-V code does this.
This commit reorganizes the code to have explicit *_arch() functions
that are called when they are enabled, similar to how some of the crypto
library code works (e.g. chacha_crypt() calls chacha_crypt_arch()).
Make the existing kconfig choice for the CRC32 implementation also
control whether the arch-optimized implementation (if one is available)
is enabled or not. Make it enabled by default if CRC32 is also enabled.
The result is that arch-optimized CRC32 library functions will be
included automatically when appropriate, but it is now possible to
disable them. They can also now be built as a loadable module if the
CRC32 library functions happen to be used only by loadable modules, in
which case the arch and base CRC32 modules will be automatically loaded
via direct symbol dependency when appropriate.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the leading underscores from __crc32c_le_base().
This is in preparation for adding crc32c_le_arch() and eventually
renaming __crc32c_le() to crc32c_le().
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* for-next/mops:
: More FEAT_MOPS (memcpy instructions) uses - in-kernel routines
arm64: mops: Document requirements for hypervisors
arm64: lib: Use MOPS for copy_page() and clear_page()
arm64: lib: Use MOPS for memcpy() routines
arm64: mops: Document booting requirement for HCR_EL2.MCE2
arm64: mops: Handle MOPS exceptions from EL1
arm64: probes: Disable kprobes/uprobes on MOPS instructions
# Conflicts:
# arch/arm64/kernel/entry-common.c
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Similarly to what was done to the memcpy() routines, make copy_page()
and clear_page() also use the Armv8.8 FEAT_MOPS instructions.
Note: For copy_page() this uses the CPY* instructions instead of CPYF*
as CPYF* doesn't allow src and dst to be equal. It's not clear if
copy_page() needs to allow equal src and dst but it has worked so far
with the current implementation and there is no documentation forbidding
it.
Note, the unoptimized version of copy_page() in assembler.h is left as
it is.
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20240930161051.3777828-6-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Make memcpy(), memmove() and memset() use the Armv8.8 FEAT_MOPS
instructions when implemented on the CPU.
The CPY*/SET* instructions copy or set a block of memory of arbitrary
size and alignment. They can be interrupted by the CPU and the copying
resumed later. Their performance is expected to be close to the best
generic copy/set sequence of loads/stores for a given CPU. Using them in
the kernel's copy/set routines therefore avoids the need to periodically
rewrite the routines to optimize for new microarchitectures. It could
also lead to a performance improvement for some CPUs and systems.
With this change the kernel will always use the instructions if they are
implemented on the CPU (and have not been disabled by the arm64.nomops
command line parameter). When not implemented the usual routines will be
used (patched via alternatives). Note, we need to patch B/NOP instead of
the whole sequence to avoid executing a partially patched sequence in
case the compiler generates a mem*() call inside the alternatives
patching code.
Note that MOPS instructions have relaxed behavior on Device memory, but
it is expected that these routines are not generally used on MMIO.
Note: For memcpy(), this uses the CPY* instructions instead of CPYF*, as
CPY* allows overlaps between the source and destination buffers, and
despite contradicting the C standard, compilers require that memcpy()
work on exactly overlapping source and destination:
https://gcc.gnu.org/onlinedocs/gcc/Standards.html#C-Language
https://reviews.llvm.org/D86993
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20240930161051.3777828-5-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that kernel mode NEON no longer disables preemption, using FP/SIMD
in library code which is not obviously part of the crypto subsystem is
no longer problematic, as it will no longer incur unexpected latencies.
So accelerate the CRC-32 library code on arm64 to use a 4-way
interleave, using PMULL instructions to implement the folding.
On Apple M2, this results in a speedup of 2 - 2.8x when using input
sizes of 1k - 8k. For smaller sizes, the overhead of preserving and
restoring the FP/SIMD register file may not be worth it, so 1k is used
as a threshold for choosing this code path.
The coefficient tables were generated using code provided by Eric. [0]
[0] https://github.com/ebiggers/libdeflate/blob/master/scripts/gen_crc32_multipliers.c
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20241018075347.2821102-8-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In preparation for a new user, reorganize the bit/byte ordering macros
that are used to parameterize the crc32 template code and instantiate
CRC-32, CRC-32c and 'big endian' CRC-32.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20241018075347.2821102-7-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|/
|
|
|
|
|
|
|
|
|
| |
In preparation for adding another code path for performing CRC-32, move
the alternative patching for ARM64_HAS_CRC32 into C code. The logic for
deciding whether to use this new code path will be implemented in C too.
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20241018075347.2821102-6-ardb+git@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.
Link: https://lkml.kernel.org/r/20240329072441.591471-6-samuel.holland@sifive.com
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Xuerui <git@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.
Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
access using tpidr_el1"), the per-cpu offset for the CPU is stored in
the tpidr_el1/2 register of that CPU.
To support this BPF instruction in the ARM64 JIT, the following ARM64
instructions are emitted:
mov dst, src // Move src to dst, if src != dst
mrs tmp, tpidr_el1/2 // Move per-cpu offset of the current cpu in tmp.
add dst, dst, tmp // Add the per cpu offset to the dst.
To measure the performance improvement provided by this change, the
benchmark in [1] was used:
Before:
glob-arr-inc : 23.597 ± 0.012M/s
arr-inc : 23.173 ± 0.019M/s
hash-inc : 12.186 ± 0.028M/s
After:
glob-arr-inc : 23.819 ± 0.034M/s
arr-inc : 23.285 ± 0.017M/s
hash-inc : 12.419 ± 0.011M/s
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Back in 2016, it was argued that implementations lacking a HW
prefetcher could be helped by sprinkling a number of PRFM
instructions in strategic locations.
In 2023, the one platform that presumably needed this hack is no
longer in active use (let alone maintained), and an quick
experiment shows dropping this hack only leads to a 0.4% drop
on a full kernel compilation (tested on a MT30-GS0 48 CPU system).
Given that this is pretty much in the noise department and that
it may give odd ideas to other implementers, drop the hack for
good.
Suggested-by: Will Deacon <will@kernel.org>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20231122133754.1240687-1-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In __delay() we use cpus_have_const_cap() to check for ARM64_HAS_WFXT,
but this is not necessary and alternative_has_cap() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
The cpus_have_const_cap() check in __delay() is an optimization to use
WFIT and WFET in preference to busy-polling the counter and/or using
regular WFE and relying upon the architected timer event stream. It is
not necessary to apply this optimization in the window between detecting
the ARM64_HAS_WFXT cpucap and patching alternatives.
This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main one is a fix for a broken strscpy() conversion that landed in
the merge window and broke early parsing of the kernel command line.
- Fix an incorrect mask in the CXL PMU driver
- Fix a regression in early parsing of the kernel command line
- Fix an IP checksum OoB access reported by syzbot"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: csum: Fix OoB access in IP checksum code for negative lengths
arm64/sysreg: Fix broken strncpy() -> strscpy() conversion
perf: CXL: fix mismatched number of counters mask
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Although commit c2c24edb1d9c ("arm64: csum: Fix pathological zero-length
calls") added an early return for zero-length input, syzkaller has
popped up with an example of a _negative_ length which causes an
undefined shift and an out-of-bounds read:
| BUG: KASAN: slab-out-of-bounds in do_csum+0x44/0x254 arch/arm64/lib/csum.c:39
| Read of size 4294966928 at addr ffff0000d7ac0170 by task syz-executor412/5975
|
| CPU: 0 PID: 5975 Comm: syz-executor412 Not tainted 6.4.0-rc4-syzkaller-g908f31f2a05b #0
| Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
| Call trace:
| dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:233
| show_stack+0x2c/0x44 arch/arm64/kernel/stacktrace.c:240
| __dump_stack lib/dump_stack.c:88 [inline]
| dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
| print_address_description mm/kasan/report.c:351 [inline]
| print_report+0x174/0x514 mm/kasan/report.c:462
| kasan_report+0xd4/0x130 mm/kasan/report.c:572
| kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:187
| __kasan_check_read+0x20/0x30 mm/kasan/shadow.c:31
| do_csum+0x44/0x254 arch/arm64/lib/csum.c:39
| csum_partial+0x30/0x58 lib/checksum.c:128
| gso_make_checksum include/linux/skbuff.h:4928 [inline]
| __udp_gso_segment+0xaf4/0x1bc4 net/ipv4/udp_offload.c:332
| udp6_ufo_fragment+0x540/0xca0 net/ipv6/udp_offload.c:47
| ipv6_gso_segment+0x5cc/0x1760 net/ipv6/ip6_offload.c:119
| skb_mac_gso_segment+0x2b4/0x5b0 net/core/gro.c:141
| __skb_gso_segment+0x250/0x3d0 net/core/dev.c:3401
| skb_gso_segment include/linux/netdevice.h:4859 [inline]
| validate_xmit_skb+0x364/0xdbc net/core/dev.c:3659
| validate_xmit_skb_list+0x94/0x130 net/core/dev.c:3709
| sch_direct_xmit+0xe8/0x548 net/sched/sch_generic.c:327
| __dev_xmit_skb net/core/dev.c:3805 [inline]
| __dev_queue_xmit+0x147c/0x3318 net/core/dev.c:4210
| dev_queue_xmit include/linux/netdevice.h:3085 [inline]
| packet_xmit+0x6c/0x318 net/packet/af_packet.c:276
| packet_snd net/packet/af_packet.c:3081 [inline]
| packet_sendmsg+0x376c/0x4c98 net/packet/af_packet.c:3113
| sock_sendmsg_nosec net/socket.c:724 [inline]
| sock_sendmsg net/socket.c:747 [inline]
| __sys_sendto+0x3b4/0x538 net/socket.c:2144
Extend the early return to reject negative lengths as well, aligning our
implementation with the generic code in lib/checksum.c
Cc: Robin Murphy <robin.murphy@arm.com>
Fixes: 5777eaed566a ("arm64: Implement optimised checksum routine")
Reported-by: syzbot+4a9f9820bd8d302e22f7@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000e0e94c0603f8d213@google.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To support BPF sign-extend load instructions, add encoders for
LDRSB/LDRSH/LDRSW.
LDRSB/LDRSH/LDRSW (immediate) is encoded as follows:
3 2 2 2 2 1 0 0
0 7 6 4 2 0 5 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| sz|1 1 1|0|0 1|opc| imm12 | Rn | Rt |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
LDRSB/LDRSH/LDRSW (register) is encoded as follows:
3 2 2 2 2 2 1 1 1 1 0 0
0 7 6 4 2 1 6 3 2 0 5 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| sz|1 1 1|0|0 0|opc|1| Rm | opt |S|1 0| Rn | Rt |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
where:
- sz
indicates whether 8-bit, 16-bit or 32-bit data is to be loaded
- opc
opc[1] (bit 23) is always 1 and opc[0] == 1 indicates regsize
is 32-bit. Since BPF signed load instructions always exend the
sign bit to bit 63 regardless of whether it loads an 8-bit,
16-bit or 32-bit data. So only 64-bit register size is required.
That is, it's sufficient to set field opc fixed to 0x2.
- opt
Indicates whether to sign extend the offset register Rm and the
effective bits of Rm. We set opt to 0x7 (SXTX) since we'll use
Rm as a sgined 64-bit value in BPF.
- S
Optional only when opt field is 0x3 (LSL)
In short, the above fields are encoded to the values listed below.
sz opc opt S
LDRSB (immediate) 0x0 0x2 na na
LDRSH (immediate) 0x1 0x2 na na
LDRSW (immediate) 0x2 0x2 na na
LDRSB (register) 0x0 0x2 0x7 0
LDRSH (register) 0x1 0x2 0x7 0
LDRSW (register) 0x2 0x2 0x7 0
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Florent Revest <revest@chromium.org>
Acked-by: Florent Revest <revest@chromium.org>
Link: https://lore.kernel.org/bpf/20230815154158.717901-2-xukuohai@huaweicloud.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The only references to these functions are in the same file, and
there is no prototype, which causes a harmless warning:
arch/arm64/lib/xor-neon.c:13:6: error: no previous prototype for 'xor_arm64_neon_2' [-Werror=missing-prototypes]
arch/arm64/lib/xor-neon.c:40:6: error: no previous prototype for 'xor_arm64_neon_3' [-Werror=missing-prototypes]
arch/arm64/lib/xor-neon.c:76:6: error: no previous prototype for 'xor_arm64_neon_4' [-Werror=missing-prototypes]
arch/arm64/lib/xor-neon.c:121:6: error: no previous prototype for 'xor_arm64_neon_5' [-Werror=missing-prototypes]
Fixes: cc9f8349cb33 ("arm64: crypto: add NEON accelerated XOR implementation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230516160642.523862-2-arnd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray
callbacks") removed the calls to memcpy_page_flushcache().
Remove the unnecessary memcpy_page_flushcache() call.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Dan Williams" <dan.j.williams@intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20221230-kmap-x86-v1-3-15f1ecccab50@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* for-next/sysregs: (39 commits)
arm64/sysreg: Remove duplicate definitions from asm/sysreg.h
arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation
arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
arm64/sysreg: Convert MVFR2_EL1 to automatic generation
arm64/sysreg: Convert MVFR1_EL1 to automatic generation
arm64/sysreg: Convert MVFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_ISAR6_EL1 to automatic generation
arm64/sysreg: Convert ID_ISAR5_EL1 to automatic generation
arm64/sysreg: Convert ID_ISAR4_EL1 to automatic generation
arm64/sysreg: Convert ID_ISAR3_EL1 to automatic generation
arm64/sysreg: Convert ID_ISAR2_EL1 to automatic generation
arm64/sysreg: Convert ID_ISAR1_EL1 to automatic generation
arm64/sysreg: Convert ID_ISAR0_EL1 to automatic generation
arm64/sysreg: Convert ID_MMFR4_EL1 to automatic generation
arm64/sysreg: Convert ID_MMFR3_EL1 to automatic generation
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With the new-fangled generation of asm/sysreg-defs.h, some definitions
have ended up being duplicated between the two files.
Remove these duplicate definitions, and consolidate the naming for
GMID_EL1_BS_WIDTH.
Signed-off-by: Will Deacon <will@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
All users of aarch64_insn_gen_hint() (e.g. aarch64_insn_gen_nop()) pass
a constant argument and generate a constant value. Some of those users
are noinstr code (e.g. for alternatives patching).
For noinstr code it is necessary to either inline these functions or to
ensure the out-of-line versions are noinstr.
Since in all cases these are generating a constant, make them
__always_inline.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20221114135928.3000571-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The only code which needs to check for an entire instruction group is
the aarch64_insn_is_steppable() helper function used by kprobes, which
must not be instrumented, and only needs to check for the "Branch,
exception generation and system instructions" class.
Currently we have an out-of-line helper in insn.c which must be marked
as __kprobes, which indexes a table with some bits extracted from the
instruction. In aarch64_insn_is_steppable() we then need to compare the
result with an expected enum value.
It would be simpler to have a predicate for this, as with the other
aarch64_insn_is_*() helpers, which would be always inlined to prevent
inadvertent instrumentation, and would permit better code generation.
This patch adds a predicate function for this instruction group using
the existing __AARCH64_INSN_FUNCS() helpers, and removes the existing
out-of-line helper. As the only class we currently care about is the
branch+exception+sys class, I have only added helpers for this, and left
the other classes unimplemented for now.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20221114135928.3000571-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We have a number of aarch64_insn_*() predicates which are used in code
which is not instrumentation safe (e.g. alternatives patching, kprobes).
Some of those are marked with __kprobes, but most are not, and are
implemented out-of-line in insn.c.
This patch moves the predicates to insn.h and marks them with
__always_inline. This is ensures that they will respect the
instrumentation requirements of their caller which they will be inlined
into.
At the same time, I've formatted each of the functions consistently as a
list, to make them easier to read and update in future.
Other than preventing unwanted instrumentation, there should be no
functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20221114135928.3000571-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are no users of aarch64_insn_gen_prefetch(), and which encodes a
PRFM (immediate) with a hard-coded offset of 0.
Remove it for now; we can always restore it with tests if we need it in
future.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20221114135928.3000571-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking changes from Paolo Abeni:
"Core:
- Refactor the forward memory allocation to better cope with memory
pressure with many open sockets, moving from a per socket cache to
a per-CPU one
- Replace rwlocks with RCU for better fairness in ping, raw sockets
and IP multicast router.
- Network-side support for IO uring zero-copy send.
- A few skb drop reason improvements, including codegen the source
file with string mapping instead of using macro magic.
- Rename reference tracking helpers to a more consistent netdev_*
schema.
- Adapt u64_stats_t type to address load/store tearing issues.
- Refine debug helper usage to reduce the log noise caused by bots.
BPF:
- Improve socket map performance, avoiding skb cloning on read
operation.
- Add support for 64 bits enum, to match types exposed by kernel.
- Introduce support for sleepable uprobes program.
- Introduce support for enum textual representation in libbpf.
- New helpers to implement synproxy with eBPF/XDP.
- Improve loop performances, inlining indirect calls when possible.
- Removed all the deprecated libbpf APIs.
- Implement new eBPF-based LSM flavor.
- Add type match support, which allow accurate queries to the eBPF
used types.
- A few TCP congetsion control framework usability improvements.
- Add new infrastructure to manipulate CT entries via eBPF programs.
- Allow for livepatch (KLP) and BPF trampolines to attach to the same
kernel function.
Protocols:
- Introduce per network namespace lookup tables for unix sockets,
increasing scalability and reducing contention.
- Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
- Add support to forciby close TIME_WAIT TCP sockets via user-space
tools.
- Significant performance improvement for the TLS 1.3 receive path,
both for zero-copy and not-zero-copy.
- Support for changing the initial MTPCP subflow priority/backup
status
- Introduce virtually contingus buffers for sockets over RDMA, to
cope better with memory pressure.
- Extend CAN ethtool support with timestamping capabilities
- Refactor CAN build infrastructure to allow building only the needed
features.
Driver API:
- Remove devlink mutex to allow parallel commands on multiple links.
- Add support for pause stats in distributed switch.
- Implement devlink helpers to query and flash line cards.
- New helper for phy mode to register conversion.
New hardware / drivers:
- Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
- Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
- Ethernet DSA driver for the Microchip LAN937x switch.
- Ethernet PHY driver for the Aquantia AQR113C EPHY.
- CAN driver for the OBD-II ELM327 interface.
- CAN driver for RZ/N1 SJA1000 CAN controller.
- Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
Drivers:
- Intel Ethernet NICs:
- i40e: add support for vlan pruning
- i40e: add support for XDP framented packets
- ice: improved vlan offload support
- ice: add support for PPPoE offload
- Mellanox Ethernet (mlx5)
- refactor packet steering offload for performance and scalability
- extend support for TC offload
- refactor devlink code to clean-up the locking schema
- support stacked vlans for bridge offloads
- use TLS objects pool to improve connection rate
- Netronome Ethernet NICs (nfp):
- extend support for IPv6 fields mangling offload
- add support for vepa mode in HW bridge
- better support for virtio data path acceleration (VDPA)
- enable TSO by default
- Microsoft vNIC driver (mana)
- add support for XDP redirect
- Others Ethernet drivers:
- bonding: add per-port priority support
- microchip lan743x: extend phy support
- Fungible funeth: support UDP segmentation offload and XDP xmit
- Solarflare EF100: add support for virtual function representors
- MediaTek SoC: add XDP support
- Mellanox Ethernet/IB switch (mlxsw):
- dropped support for unreleased H/W (XM router).
- improved stats accuracy
- unified bridge model coversion improving scalability (parts 1-6)
- support for PTP in Spectrum-2 asics
- Broadcom PHYs
- add PTP support for BCM54210E
- add support for the BCM53128 internal PHY
- Marvell Ethernet switches (prestera):
- implement support for multicast forwarding offload
- Embedded Ethernet switches:
- refactor OcteonTx MAC filter for better scalability
- improve TC H/W offload for the Felix driver
- refactor the Microchip ksz8 and ksz9477 drivers to share the
probe code (parts 1, 2), add support for phylink mac
configuration
- Other WiFi:
- Microchip wilc1000: diable WEP support and enable WPA3
- Atheros ath10k: encapsulation offload support
Old code removal:
- Neterion vxge ethernet driver: this is untouched since more than 10 years"
* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
doc: sfp-phylink: Fix a broken reference
wireguard: selftests: support UML
wireguard: allowedips: don't corrupt stack when detecting overflow
wireguard: selftests: update config fragments
wireguard: ratelimiter: use hrtimer in selftest
net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
net: usb: ax88179_178a: Bind only to vendor-specific interface
selftests: net: fix IOAM test skip return code
net: usb: make USB_RTL8153_ECM non user configurable
net: marvell: prestera: remove reduntant code
octeontx2-pf: Reduce minimum mtu size to 60
net: devlink: Fix missing mutex_unlock() call
net/tls: Remove redundant workqueue flush before destroy
net: txgbe: Fix an error handling path in txgbe_probe()
net: dsa: Fix spelling mistakes and cleanup code
Documentation: devlink: add add devlink-selftests to the table of contents
dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
net: ionic: fix error check for vlan flags in ionic_set_nic_features()
net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
nfp: flower: add support for tunnel offload without key ID
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add LDR (literal) instruction to load data from address relative to PC.
This instruction will be used to implement long jump from bpf prog to
bpf trampoline in the follow-up patch.
The instruction encoding:
3 2 2 2 0 0
0 7 6 4 5 0
+-----+-------+---+-----+-------------------------------------+--------+
| 0 x | 0 1 1 | 0 | 0 0 | imm19 | Rt |
+-----+-------+---+-----+-------------------------------------+--------+
for 32-bit, variant x == 0; for 64-bit, x == 1.
branch_imm_common() is used to check the distance between pc and target
address, since it's reused by this patch and LDR (literal) is not a branch
instruction, rename it to label_imm_common().
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/bpf/20220711150823.2128542-3-xukuohai@huawei.com
|
|/
|
|
|
|
|
|
|
|
| |
Usually our defines for bitfields in system registers do not include a SYS_
prefix but those for GMID do. In preparation for automatic generation of
defines remove that prefix. No functional change.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220704170302.2609529-9-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull kvm updates from Paolo Bonzini:
"S390:
- ultravisor communication device driver
- fix TEID on terminating storage key ops
RISC-V:
- Added Sv57x4 support for G-stage page table
- Added range based local HFENCE functions
- Added remote HFENCE functions based on VCPU requests
- Added ISA extension registers in ONE_REG interface
- Updated KVM RISC-V maintainers entry to cover selftests support
ARM:
- Add support for the ARMv8.6 WFxT extension
- Guard pages for the EL2 stacks
- Trap and emulate AArch32 ID registers to hide unsupported features
- Ability to select and save/restore the set of hypercalls exposed to
the guest
- Support for PSCI-initiated suspend in collaboration with userspace
- GICv3 register-based LPI invalidation support
- Move host PMU event merging into the vcpu data structure
- GICv3 ITS save/restore fixes
- The usual set of small-scale cleanups and fixes
x86:
- New ioctls to get/set TSC frequency for a whole VM
- Allow userspace to opt out of hypercall patching
- Only do MSR filtering for MSRs accessed by rdmsr/wrmsr
AMD SEV improvements:
- Add KVM_EXIT_SHUTDOWN metadata for SEV-ES
- V_TSC_AUX support
Nested virtualization improvements for AMD:
- Support for "nested nested" optimizations (nested vVMLOAD/VMSAVE,
nested vGIF)
- Allow AVIC to co-exist with a nested guest running
- Fixes for LBR virtualizations when a nested guest is running, and
nested LBR virtualization support
- PAUSE filtering for nested hypervisors
Guest support:
- Decoupling of vcpu_is_preempted from PV spinlocks"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (199 commits)
KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest
KVM: selftests: x86: Sync the new name of the test case to .gitignore
Documentation: kvm: reorder ARM-specific section about KVM_SYSTEM_EVENT_SUSPEND
x86, kvm: use correct GFP flags for preemption disabled
KVM: LAPIC: Drop pending LAPIC timer injection when canceling the timer
x86/kvm: Alloc dummy async #PF token outside of raw spinlock
KVM: x86: avoid calling x86 emulator without a decoded instruction
KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak
x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave)
s390/uv_uapi: depend on CONFIG_S390
KVM: selftests: x86: Fix test failure on arch lbr capable platforms
KVM: LAPIC: Trace LAPIC timer expiration on every vmentry
KVM: s390: selftest: Test suppression indication on key prot exception
KVM: s390: Don't indicate suppression on dirtying, failing memop
selftests: drivers/s390x: Add uvdevice tests
drivers/s390/char: Add Ultravisor io device
MAINTAINERS: Update KVM RISC-V entry to cover selftests support
RISC-V: KVM: Introduce ISA extension register
RISC-V: KVM: Cleanup stale TLB entries when host CPU changes
RISC-V: KVM: Add remote HFENCE functions based on VCPU requests
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Marginally optimise __delay() by using a WFIT/WFET sequence.
It probably is a win if no interrupt fires during the delay.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419182755.601427-11-maz@kernel.org
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core
----
- Support TCPv6 segmentation offload with super-segments larger than
64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
- Generalize skb freeing deferral to per-cpu lists, instead of
per-socket lists.
- Add a netdev statistic for packets dropped due to L2 address
mismatch (rx_otherhost_dropped).
- Continue work annotating skb drop reasons.
- Accept alternative netdev names (ALT_IFNAME) in more netlink
requests.
- Add VLAN support for AF_PACKET SOCK_RAW GSO.
- Allow receiving skb mark from the socket as a cmsg.
- Enable memcg accounting for veth queues, sysctl tables and IPv6.
BPF
---
- Add libbpf support for User Statically-Defined Tracing (USDTs).
- Speed up symbol resolution for kprobes multi-link attachments.
- Support storing typed pointers to referenced and unreferenced
objects in BPF maps.
- Add support for BPF link iterator.
- Introduce access to remote CPU map elements in BPF per-cpu map.
- Allow middle-of-the-road settings for the
kernel.unprivileged_bpf_disabled sysctl.
- Implement basic types of dynamic pointers e.g. to allow for
dynamically sized ringbuf reservations without extra memory copies.
Protocols
---------
- Retire port only listening_hash table, add a second bind table
hashed by port and address. Avoid linear list walk when binding to
very popular ports (e.g. 443).
- Add bridge FDB bulk flush filtering support allowing user space to
remove all FDB entries matching a condition.
- Introduce accept_unsolicited_na sysctl for IPv6 to implement
router-side changes for RFC9131.
- Support for MPTCP path manager in user space.
- Add MPTCP support for fallback to regular TCP for connections that
have never connected additional subflows or transmitted
out-of-sequence data (partial support for RFC8684 fallback).
- Avoid races in MPTCP-level window tracking, stabilize and improve
throughput.
- Support lockless operation of GRE tunnels with seq numbers enabled.
- WiFi support for host based BSS color collision detection.
- Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
- Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
- Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
- Allow matching on the number of VLAN tags via tc-flower.
- Add tracepoint for tcp_set_ca_state().
Driver API
----------
- Improve error reporting from classifier and action offload.
- Add support for listing line cards in switches (devlink).
- Add helpers for reporting page pool statistics with ethtool -S.
- Add support for reading clock cycles when using PTP virtual clocks,
instead of having the driver convert to time before reporting. This
makes it possible to report time from different vclocks.
- Support configuring low-latency Tx descriptor push via ethtool.
- Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
New hardware / drivers
----------------------
- Ethernet:
- Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
- Sunplus SP7021 SoC (sp7021_emac)
- Add support for Renesas RZ/V2M (in ravb)
- Add support for MediaTek mt7986 switches (in mtk_eth_soc)
- Ethernet PHYs:
- ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
- TI DP83TD510 PHY
- Microchip LAN8742/LAN88xx PHYs
- WiFi:
- Driver for pureLiFi X, XL, XC devices (plfxlc)
- Driver for Silicon Labs devices (wfx)
- Support for WCN6750 (in ath11k)
- Support Realtek 8852ce devices (in rtw89)
- Mobile:
- MediaTek T700 modems (Intel 5G 5000 M.2 cards)
- CAN:
- ctucanfd: add support for CTU CAN FD open-source IP core from
Czech Technical University in Prague
Drivers
-------
- Delete a number of old drivers still using virt_to_bus().
- Ethernet NICs:
- intel: support TSO on tunnels MPLS
- broadcom: support multi-buffer XDP
- nfp: support VF rate limiting
- sfc: use hardware tx timestamps for more than PTP
- mlx5: multi-port eswitch support
- hyper-v: add support for XDP_REDIRECT
- atlantic: XDP support (including multi-buffer)
- macb: improve real-time perf by deferring Tx processing to NAPI
- High-speed Ethernet switches:
- mlxsw: implement basic line card information querying
- prestera: add support for traffic policing on ingress and egress
- Embedded Ethernet switches:
- lan966x: add support for packet DMA (FDMA)
- lan966x: add support for PTP programmable pins
- ti: cpsw_new: enable bc/mc storm prevention
- Qualcomm 802.11ax WiFi (ath11k):
- Wake-on-WLAN support for QCA6390 and WCN6855
- device recovery (firmware restart) support
- support setting Specific Absorption Rate (SAR) for WCN6855
- read country code from SMBIOS for WCN6855/QCA6390
- enable keep-alive during WoWLAN suspend
- implement remain-on-channel support
- MediaTek WiFi (mt76):
- support Wireless Ethernet Dispatch offloading packet movement
between the Ethernet switch and WiFi interfaces
- non-standard VHT MCS10-11 support
- mt7921 AP mode support
- mt7921 IPv6 NS offload support
- Ethernet PHYs:
- micrel: ksz9031/ksz9131: cabletest support
- lan87xx: SQI support for T1 PHYs
- lan937x: add interrupt support for link detection"
* tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
ptp: ocp: Add firmware header checks
ptp: ocp: fix PPS source selector debugfs reporting
ptp: ocp: add .init function for sma_op vector
ptp: ocp: vectorize the sma accessor functions
ptp: ocp: constify selectors
ptp: ocp: parameterize input/output sma selectors
ptp: ocp: revise firmware display
ptp: ocp: add Celestica timecard PCI ids
ptp: ocp: Remove #ifdefs around PCI IDs
ptp: ocp: 32-bit fixups for pci start address
Revert "net/smc: fix listen processing for SMC-Rv2"
ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
selftests/bpf: Dynptr tests
bpf: Add dynptr data slices
bpf: Add bpf_dynptr_read and bpf_dynptr_write
bpf: Dynptr support for ring buffers
bpf: Add bpf_dynptr_from_mem for local dynptrs
bpf: Add verifier support for dynptrs
bpf: Suppress 'passing zero to PTR_ERR' warning
bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
...
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch introduces ldr/str with immediate offset support to simplify
the JIT implementation of BPF LDX/STX instructions on arm64. Although
arm64 ldr/str immediate is available in pre-index, post-index and
unsigned offset forms, the unsigned offset form is sufficient for BPF,
so this patch only adds this type.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321152852.2334294-2-xukuohai@huawei.com
|
|/
|
|
|
|
|
|
|
|
|
| |
Invoking user_ldst to explicitly add a post-increment of 0 is silly.
Just use a normal USER() annotation and save the redundant instruction.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tong Tiangen <tongtiangen@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20220420030418.3189040-6-tongtiangen@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- hwrng core now credits for low-quality RNG devices.
Algorithms:
- Optimisations for neon aes on arm/arm64.
- Add accelerated crc32_be on arm64.
- Add ffdheXYZ(dh) templates.
- Disallow hmac keys < 112 bits in FIPS mode.
- Add AVX assembly implementation for sm3 on x86.
Drivers:
- Add missing local_bh_disable calls for crypto_engine callback.
- Ensure BH is disabled in crypto_engine callback path.
- Fix zero length DMA mappings in ccree.
- Add synchronization between mailbox accesses in octeontx2.
- Add Xilinx SHA3 driver.
- Add support for the TDES IP available on sama7g5 SoC in atmel"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (137 commits)
crypto: xilinx - Turn SHA into a tristate and allow COMPILE_TEST
MAINTAINERS: update HPRE/SEC2/TRNG driver maintainers list
crypto: dh - Remove the unused function dh_safe_prime_dh_alg()
hwrng: nomadik - Change clk_disable to clk_disable_unprepare
crypto: arm64 - cleanup comments
crypto: qat - fix initialization of pfvf rts_map_msg structures
crypto: qat - fix initialization of pfvf cap_msg structures
crypto: qat - remove unneeded assignment
crypto: qat - disable registration of algorithms
crypto: hisilicon/qm - fix memset during queues clearing
crypto: xilinx: prevent probing on non-xilinx hardware
crypto: marvell/octeontx - Use swap() instead of open coding it
crypto: ccree - Fix use after free in cc_cipher_exit()
crypto: ccp - ccp_dmaengine_unregister release dma channels
crypto: octeontx2 - fix missing unlock
hwrng: cavium - fix NULL but dereferenced coccicheck error
crypto: cavium/nitrox - don't cast parameter in bit operations
crypto: vmx - add missing dependencies
MAINTAINERS: Add maintainer for Xilinx ZynqMP SHA3 driver
crypto: xilinx - Add Xilinx SHA3 driver
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Modern compilers are perfectly capable of extracting parallelism from
the XOR routines, provided that the prototypes reflect the nature of the
input accurately, in particular, the fact that the input vectors are
expected not to overlap. This is not documented explicitly, but is
implied by the interchangeability of the various C routines, some of
which use temporary variables while others don't: this means that these
routines only behave identically for non-overlapping inputs.
So let's decorate these input vectors with the __restrict modifier,
which informs the compiler that there is no overlap. While at it, make
the input-only vectors pointer-to-const as well.
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/563
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It makes no sense to leave crc32_be using the generic code while we
only accelerate the little-endian ops.
Even though the big-endian form doesn't fit as smoothly into the arm64,
we can speed it up and avoid hitting the D cache.
Tested on Cortex-A53. Without acceleration:
crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
crc32: self tests passed, processed 225944 bytes in 192240 nsec
crc32c: CRC_LE_BITS = 64
crc32c: self tests passed, processed 112972 bytes in 21360 nsec
With acceleration:
crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
crc32: self tests passed, processed 225944 bytes in 53480 nsec
crc32c: CRC_LE_BITS = 64
crc32c: self tests passed, processed 112972 bytes in 21480 nsec
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | | |
* for-next/strings:
Revert "arm64: Mitigate MTE issues with str{n}cmp()"
arm64: lib: Import latest version of Arm Optimized Routines' strncmp
arm64: lib: Import latest version of Arm Optimized Routines' strcmp
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This reverts commit 59a68d4138086c015ab8241c3267eec5550fbd44.
Now that the str{n}cmp functions have been updated to handle MTE
properly, the workaround to use the generic functions is no longer
needed.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220301101435.19327-4-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Import the latest version of the Arm Optimized Routines strncmp function based
on the upstream code of string/aarch64/strncmp.S at commit 189dfefe37d5 from:
https://github.com/ARM-software/optimized-routines
This latest version includes MTE support.
Note that for simplicity Arm have chosen to contribute this code to Linux under
GPLv2 rather than the original MIT OR Apache-2.0 WITH LLVM-exception license.
Arm is the sole copyright holder for this code.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220301101435.19327-3-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Import the latest version of the Arm Optimized Routines strcmp function based
on the upstream code of string/aarch64/strcmp.S at commit 189dfefe37d5 from:
https://github.com/ARM-software/optimized-routines
This latest version includes MTE support.
Note that for simplicity Arm have chosen to contribute this code to Linux under
GPLv2 rather than the original MIT OR Apache-2.0 WITH LLVM-exception license.
Arm is the sole copyright holder for this code.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220301101435.19327-2-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* for-next/linkage:
arm64: module: remove (NOLOAD) from linker script
linkage: remove SYM_FUNC_{START,END}_ALIAS()
x86: clean up symbol aliasing
arm64: clean up symbol aliasing
linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}()
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that we have SYM_FUNC_ALIAS() and SYM_FUNC_ALIAS_WEAK(), use those
to simplify and more consistently define function aliases across
arch/arm64.
Aliases are now defined in terms of a canonical function name. For
position-independent functions I've made the __pi_<func> name the
canonical name, and defined other alises in terms of this.
The SYM_FUNC_{START,END}_PI(func) macros obscure the __pi_<func> name,
and make this hard to seatch for. The SYM_FUNC_START_WEAK_PI() macro
also obscures the fact that the __pi_<func> fymbol is global and the
<func> symbol is weak. For clarity, I have removed these macros and used
SYM_FUNC_{START,END}() directly with the __pi_<func> name.
For example:
SYM_FUNC_START_WEAK_PI(func)
... asm insns ...
SYM_FUNC_END_PI(func)
EXPORT_SYMBOL(func)
... becomes:
SYM_FUNC_START(__pi_func)
... asm insns ...
SYM_FUNC_END(__pi_func)
SYM_FUNC_ALIAS_WEAK(func, __pi_func)
EXPORT_SYMBOL(func)
For clarity, where there are multiple annotations such as
EXPORT_SYMBOL(), I've tried to keep annotations grouped by symbol. For
example, where a function has a name and an alias which are both
exported, this is organised as:
SYM_FUNC_START(func)
... asm insns ...
SYM_FUNC_END(func)
EXPORT_SYMBOL(func)
SYM_FUNC_ALIAS(alias, func)
EXPORT_SYMBOL(alias)
For consistency with the other string functions, I've defined strrchr as
a position-independent function, as it can safely be used as such even
though we have no users today.
As we no longer use SYM_FUNC_{START,END}_ALIAS(), our local copies are
removed. The common versions will be removed by a subsequent patch.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220216162229.1076788-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | | |
* for-next/insn:
arm64: insn: add encoders for atomic operations
arm64: move AARCH64_BREAK_FAULT into insn-def.h
arm64: insn: Generate 64 bit mask immediates correctly
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It is a preparation patch for eBPF atomic supports under arm64. eBPF
needs support atomic[64]_fetch_add, atomic[64]_[fetch_]{and,or,xor} and
atomic[64]_{xchg|cmpxchg}. The ordering semantics of eBPF atomics are
the same with the implementations in linux kernel.
Add three helpers to support LDCLR/LDEOR/LDSET/SWP, CAS and DMB
instructions. STADD/STCLR/STEOR/STSET are simply encoded as aliases for
LDADD/LDCLR/LDEOR/LDSET with XZR as the destination register, so no extra
helper is added. atomic_fetch_add() and other atomic ops needs support for
STLXR instruction, so extend enum aarch64_insn_ldst_type to do that.
LDADD/LDEOR/LDSET/SWP and CAS instructions are only available when LSE
atomics is enabled, so just return AARCH64_BREAK_FAULT directly in
these newly-added helpers if CONFIG_ARM64_LSE_ATOMICS is disabled.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20220217072232.1186625-3-houtao1@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When the insn framework is used to encode an AND/ORR/EOR instruction,
aarch64_encode_immediate() is used to pick the immr imms values.
If the immediate is a 64bit mask, with bit 63 set, and zeros in any
of the upper 32 bits, the immr value is incorrectly calculated meaning
the wrong mask is generated.
For example, 0x8000000000000001 should have an immr of 1, but 32 is used,
meaning the resulting mask is 0x0000000300000000.
It would appear eBPF is unable to hit these cases, as build_insn()'s
imm value is a s32, so when used with BPF_ALU64, the sign-extended
u64 immediate would always have all-1s or all-0s in the upper 32 bits.
KVM does not generate a va_mask with any of the top bits set as these
VA wouldn't be usable with TTBR0_EL2.
This happens because the rotation is calculated from fls(~imm), which
takes an unsigned int, but the immediate may be 64bit.
Use fls64() so the 64bit mask doesn't get truncated to a u32.
Signed-off-by: James Morse <james.morse@arm.com>
Brown-paper-bag-for: Marc Zyngier <maz@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127162127.2391947-4-james.morse@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
|
| |
Rather than explicitly calculating the number of bytes for a compact tag
storage format corresponding to a page, just add a MTE_PAGE_TAG_STORAGE
macro. With the current MTE implementation of 4 bits per tag, we store
2 tags in a byte.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Luis Machado <luis.machado@linaro.org>
Link: https://lore.kernel.org/r/20220131165456.2160675-4-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
'for-next/stacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (32 commits)
arm64: perf: Don't register user access sysctl handler multiple times
drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
arm64: perf: Support new DT compatibles
arm64: perf: Simplify registration boilerplate
arm64: perf: Support Denver and Carmel PMUs
drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
docs: perf: Add description for HiSilicon PCIe PMU driver
dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
drivers: perf: Add LLC-TAD perf counter support
perf/smmuv3: Synthesize IIDR from CoreSight ID registers
perf/smmuv3: Add devicetree support
dt-bindings: Add Arm SMMUv3 PMCG binding
perf/arm-cmn: Add debugfs topology info
perf/arm-cmn: Add CI-700 Support
dt-bindings: perf: arm-cmn: Add CI-700
perf/arm-cmn: Support new IP features
perf/arm-cmn: Demarcate CMN-600 specifics
perf/arm-cmn: Move group validation data off-stack
perf/arm-cmn: Optimise DTC counter accesses
...
* for-next/misc:
: Miscellaneous patches
arm64: Use correct method to calculate nomap region boundaries
arm64: Drop outdated links in comments
arm64: errata: Fix exec handling in erratum 1418040 workaround
arm64: Unhash early pointer print plus improve comment
asm-generic: introduce io_stop_wc() and add implementation for ARM64
arm64: remove __dma_*_area() aliases
docs/arm64: delete a space from tagged-address-abi
arm64/fp: Add comments documenting the usage of state restore functions
arm64: mm: Use asid feature macro for cheanup
arm64: mm: Rename asid2idx() to ctxid2asid()
arm64: kexec: reduce calls to page_address()
arm64: extable: remove unused ex_handler_t definition
arm64: entry: Use SDEI event constants
arm64: Simplify checking for populated DT
arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c
* for-next/cache-ops-dzp:
: Avoid DC instructions when DCZID_EL0.DZP == 1
arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1
arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1
* for-next/stacktrace:
: Unify the arm64 unwind code
arm64: Make some stacktrace functions private
arm64: Make dump_backtrace() use arch_stack_walk()
arm64: Make profile_pc() use arch_stack_walk()
arm64: Make return_address() use arch_stack_walk()
arm64: Make __get_wchan() use arch_stack_walk()
arm64: Make perf_callchain_kernel() use arch_stack_walk()
arm64: Mark __switch_to() as __sched
arm64: Add comment for stack_info::kr_cur
arch: Make ARCH_STACKWALK independent of STACKTRACE
* for-next/xor-neon:
: Use SHA3 instructions to speed up XOR
arm64/xor: use EOR3 instructions when available
* for-next/kasan:
: Log potential KASAN shadow aliases
arm64: mm: log potential KASAN shadow alias
arm64: mm: use die_kernel_fault() in do_mem_abort()
* for-next/armv8_7-fp:
: Add HWCAPS for ARMv8.7 FEAT_AFP amd FEAT_RPRES
arm64: cpufeature: add HWCAP for FEAT_RPRES
arm64: add ID_AA64ISAR2_EL1 sys register
arm64: cpufeature: add HWCAP for FEAT_AFP
* for-next/atomics:
: arm64 atomics clean-ups and codegen improvements
arm64: atomics: lse: define RETURN ops in terms of FETCH ops
arm64: atomics: lse: improve constraints for simple ops
arm64: atomics: lse: define ANDs in terms of ANDNOTs
arm64: atomics lse: define SUBs in terms of ADDs
arm64: atomics: format whitespace consistently
* for-next/bti:
: BTI clean-ups
arm64: Ensure that the 'bti' macro is defined where linkage.h is included
arm64: Use BTI C directly and unconditionally
arm64: Unconditionally override SYM_FUNC macros
arm64: Add macro version of the BTI instruction
arm64: ftrace: add missing BTIs
arm64: kexec: use __pa_symbol(empty_zero_page)
arm64: update PAC description for kernel
* for-next/sve:
: SVE code clean-ups and refactoring in prepararation of Scalable Matrix Extensions
arm64/sve: Minor clarification of ABI documentation
arm64/sve: Generalise vector length configuration prctl() for SME
arm64/sve: Make sysctl interface for SVE reusable by SME
* for-next/kselftest:
: arm64 kselftest additions
kselftest/arm64: Add pidbench for floating point syscall cases
kselftest/arm64: Add a test program to exercise the syscall ABI
kselftest/arm64: Allow signal tests to trigger from a function
kselftest/arm64: Parameterise ptrace vector length information
* for-next/kcsan:
: Enable KCSAN for arm64
arm64: Enable KCSAN
|
| |_|/
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now we have a macro for BTI C that looks like a regular instruction change
all the users of the current BTI_C macro to just emit a BTI C directly and
remove the macro.
This does mean that we now unconditionally BTI annotate all assembly
functions, meaning that they are worse in this respect than code generated
by the compiler. The overhead should be minimal for implementations with a
reasonable HINT implementation.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211214152714.2380849-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| | |
Use the EOR3 instruction to implement xor_blocks() if the instruction is
available, which is the case if the CPU implements the SHA-3 extension.
This is about 20% faster on Apple M1 when using the 5-way version.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20211213140252.2856053-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, mte_set_mem_tag_range() and mte_zero_clear_page_tags() use
DC {GVA,GZVA} unconditionally. But, they should make sure that
DCZID_EL0.DZP, which indicates whether or not use of those instructions
is prohibited, is zero when using those instructions.
Use ST{G,ZG,Z2G} instead when DCZID_EL0.DZP == 1.
Fixes: 013bb59dbb7c ("arm64: mte: handle tags zeroing at page allocation time")
Fixes: 3d0cca0b02ac ("kasan: speed up mte_set_mem_tag_range")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Link: https://lore.kernel.org/r/20211206004736.1520989-3-reijiw@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|