summaryrefslogtreecommitdiffstats
path: root/crypto/arm_arch.h
diff options
context:
space:
mode:
authorDaniel Hu <Daniel.Hu@arm.com>2022-02-07 11:17:06 +0100
committerPauli <pauli@openssl.org>2022-05-03 06:37:46 +0200
commitb1b2146ded9ce5a84c62f30c6c4a922b449f6c90 (patch)
tree969d007a0e310df537f7f9495b353bbad4e984d4 /crypto/arm_arch.h
parentmd5: add assembly implementation for aarch64 (diff)
downloadopenssl-b1b2146ded9ce5a84c62f30c6c4a922b449f6c90.tar.xz
openssl-b1b2146ded9ce5a84c62f30c6c4a922b449f6c90.zip
Acceleration of chacha20 on aarch64 by SVE
This patch accelerates chacha20 on aarch64 when Scalable Vector Extension (SVE) is supported by CPU. Tested on modern micro-architecture with 256-bit SVE, it has the potential to improve performance up to 20% The solution takes a hybrid approach. SVE will handle multi-blocks that fit the SVE vector length, with Neon/Scalar to process any tail data Test result: With SVE type 1024 bytes 8192 bytes 16384 bytes ChaCha20 1596208.13k 1650010.79k 1653151.06k Without SVE (by Neon/Scalar) type 1024 bytes 8192 bytes 16384 bytes chacha20 1355487.91k 1372678.83k 1372662.44k The assembly code has been reviewed internally by ARM engineer Fangming.Fang@arm.com Signed-off-by: Daniel Hu <Daniel.Hu@arm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> (Merged from https://github.com/openssl/openssl/pull/17916)
Diffstat (limited to 'crypto/arm_arch.h')
-rw-r--r--crypto/arm_arch.h2
1 files changed, 2 insertions, 0 deletions
diff --git a/crypto/arm_arch.h b/crypto/arm_arch.h
index 33acbd99c0..5fc0905885 100644
--- a/crypto/arm_arch.h
+++ b/crypto/arm_arch.h
@@ -83,6 +83,8 @@ extern unsigned int OPENSSL_armv8_rsa_neonized;
# define ARMV8_SM4 (1<<10)
# define ARMV8_SHA3 (1<<11)
# define ARMV8_UNROLL8_EOR3 (1<<12)
+# define ARMV8_SVE (1<<13)
+# define ARMV8_SVE2 (1<<14)
/*
* MIDR_EL1 system register