diff options
author | Daniel Hu <Daniel.Hu@arm.com> | 2022-02-07 11:17:06 +0100 |
---|---|---|
committer | Pauli <pauli@openssl.org> | 2022-05-03 06:37:46 +0200 |
commit | b1b2146ded9ce5a84c62f30c6c4a922b449f6c90 (patch) | |
tree | 969d007a0e310df537f7f9495b353bbad4e984d4 /crypto/arm_arch.h | |
parent | md5: add assembly implementation for aarch64 (diff) | |
download | openssl-b1b2146ded9ce5a84c62f30c6c4a922b449f6c90.tar.xz openssl-b1b2146ded9ce5a84c62f30c6c4a922b449f6c90.zip |
Acceleration of chacha20 on aarch64 by SVE
This patch accelerates chacha20 on aarch64 when Scalable Vector Extension
(SVE) is supported by CPU. Tested on modern micro-architecture with
256-bit SVE, it has the potential to improve performance up to 20%
The solution takes a hybrid approach. SVE will handle multi-blocks that fit
the SVE vector length, with Neon/Scalar to process any tail data
Test result:
With SVE
type 1024 bytes 8192 bytes 16384 bytes
ChaCha20 1596208.13k 1650010.79k 1653151.06k
Without SVE (by Neon/Scalar)
type 1024 bytes 8192 bytes 16384 bytes
chacha20 1355487.91k 1372678.83k 1372662.44k
The assembly code has been reviewed internally by
ARM engineer Fangming.Fang@arm.com
Signed-off-by: Daniel Hu <Daniel.Hu@arm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/17916)
Diffstat (limited to 'crypto/arm_arch.h')
-rw-r--r-- | crypto/arm_arch.h | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/crypto/arm_arch.h b/crypto/arm_arch.h index 33acbd99c0..5fc0905885 100644 --- a/crypto/arm_arch.h +++ b/crypto/arm_arch.h @@ -83,6 +83,8 @@ extern unsigned int OPENSSL_armv8_rsa_neonized; # define ARMV8_SM4 (1<<10) # define ARMV8_SHA3 (1<<11) # define ARMV8_UNROLL8_EOR3 (1<<12) +# define ARMV8_SVE (1<<13) +# define ARMV8_SVE2 (1<<14) /* * MIDR_EL1 system register |