Cryptogams SHA
Cryptogams is Andy Polyakov's project used to develop high speed cryptographic primitives and share them with other developers. This wiki article will show you how to use Cryptogams ARMv4 SHA-1 implementation. According to the head notes the ARMv4 implementation runs around 6.5 cycles per byte (cpb). Typical C/C++ implementations run around 10 to 20 cpb and Andy's routines should outperform all of them.
Andy's Cryptogam implementations are provided by OpenSSL, but they are also available stand alone under a BSD license. The BSD style license is permissive and allows developers to use Andy's high speed cryptography without an OpenSSL dependency or licensing terms.
There are 6 steps to the process. The first step obtains the sources. The second step creates an ASM source file. The third step compiles and assembles the source file into an object file. The fourth steps determines the API. The fifth step creates a C header file. The final step integrates the object file into a program.
A few cautions before you begin. First, you are going to examine undocumented features of the OpenSSL library to learn how to work with the Cryptogam's sources. The Cryptogam sources are stable but things could change over time. Second, the ARMv4 implementation hashes full SHA blocks. You are responsible for things like data alignment, padding and side channel counter-measures.
Obtain Source Files
There are two source files you need for Cryptogams SHA. The first is arm-xlate.pl and the second is sha1-armv4.pl. They are available in the OpenSSL sources. The following commands fetch OpenSSL and then peels off the two Cryptogams files of interest.
# Clone OpenSSL for the latest Cryptogams sources git clone https://github.com/openssl/openssl.git mkdir cryptogams/ cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/ cp ./openssl/crypto/sha/asm/sha1-armv4-large.pl ./cryptogams/ cd cryptogams/
Create ASM File
The second step is to run sha1-armv4-large.pl to produce an assembly language source file that can be consumed by GCC. sha1-armv4-large.pl internally calls arm-xlate.pl. linux32 is the flavor used by the translate program. sha1-armv4.S is the output filename. In the command below note the *.S file extension, which is a capitol S. Do not use a lowercase s because GCC must drive the compile and assemble step.
perl sha1-armv4-large.pl linux32 sha1-armv4.S
GCC is needed to drive the process because there are C macros in the source file. Some Cryptogam source files have this requirement, while some others do not. sha1-armv4 happens to have the requirement.
$ cat sha1-armv4.S @ Copyright 2007-2018 The OpenSSL Project Authors. All Rights Reserved. ... #ifndef __KERNEL__ # include "arm_arch.h" #else # define __ARM_ARCH__ __LINUX_ARM_ARCH__ #endif ...
At this point there is an ASM file but it needs two small fixups. First, arm_arch.h is an OpenSSL source file so the dependency must be removed. Second, GCC defines __ARM_ARCH instead of __ARM_ARCH__ so a sed is needed.
To fixup the source files execute the following two commands:
# Remove OpenSSL include sed -i 's/# include "arm_arch.h"//g' sha1-armv4.S # Fix GCC defines sed -i 's/__ARM_ARCH__/__ARM_ARCH/g' sha1-armv4.S
Alternately, instead of the two sed's, you can open arm_arch.h, copy the defines and paste them directly into sha1-armv4.S. Take care when using arm_arch.h as it carries the OpenSSL license.
After the two fixups sha1-armv4.S is ready to be compiled by GCC.
Compile Source File
The source file is ready to be compiled and assembled. At this point there are two choices. First, you can use ARMv5t or higher which includes Thumb instructions. The following compiles the source file with ARMv5t.
$ gcc -march=armv5t -c sha1-armv4.S
The second choice uses ARMv4 and avoids Thumb instructions. If you want to avoid Thumb then add -marm to you compile command.
$ gcc -march=armv4 -marm -c sha1-armv4.S
Using ARMv5t as an example you now have an object file with the following symbols. Symbols with a capitol T are public and exported. Symbols with a lower t are private and should not be used.
$ gcc -march=armv4 -marm -c sha1-armv4.S $ nm sha1-armv4.o 00000000 T sha1_block_data_order
And you can inspect the generated code with objdump.
$ objdump --disassemble sha1-armv4.o sha1-armv4.o: file format elf32-littlearm Disassembly of section .text: 00000000 <sha1_block_data_order>: 0: e92d5ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr} 4: e0812302 add r2, r1, r2, lsl #6 8: e89000f8 ldm r0, {r3, r4, r5, r6, r7} c: e59f858c ldr r8, [pc, #1420] ; 5a0 <sha1_block_data_order+0x5a0> 10: e1a0e00d mov lr, sp 14: e24dd03c sub sp, sp, #60 ; 0x3c 18: e1a05f65 ror r5, r5, #30 1c: e1a06f66 ror r6, r6, #30 ...
Determine API
The next step is determine the API so you can call it from a C program. Unfortunately the API is not documented and you have to dig around the OpenSSL sources. Fortunately there is one function of interest called sha1_block_data_order.
A quick grep of OpenSSL sources reveals the following for sha1_block_data_order.
openssl$ grep -nIR sha1_block_data_order | grep '\.c' crypto/evp/e_sha_cbc_hmac_sha1.c:95: void sha1_block_data_order(void *c, const void *p, size_t len); crypto/evp/e_sha_cbc_hmac_sha1.c:115: sha1_block_data_order(c, ptr, len / SHA_CBLOCK); crypto/evp/e_sha_cbc_hmac_sha1.c:615: sha1_block_data_order(&key->md, data, 1); crypto/evp/e_sha_cbc_hmac_sha1.c:631: sha1_block_data_order(&key->md, data, 1); ...
We need several more symbols, and and they are OPENSSL_armcap_P, ARMV7_NEON and ARMV8_SHA1.
$ grep -nIR OPENSSL_armcap_P ... crypto/armcap.c:20:unsigned int OPENSSL_armcap_P = 0;
Lather, rinse, repeat for ARMV7_NEON and ARMV8_SHA1.
Create C Header
The fifth step creates a C header file based on information from Determine API. The header file is needed for two reasons. First, it removes the OpenSSL dependency from your project. Second, it avoids OpenSSL licensing violations.
Below is the C Header file you can use. While it is not obvious, the len parameter from Determine API is a block count, not a byte count.
/* Header file for use with Cryptogam's ARMv4 SHA1. */ /* Also see http://www.openssl.org/~appro/cryptogams/ */ /* https://wiki.openssl.org/index.php/Cryptogams_SHA. */ #ifndef CRYPTOGAMS_SHA1_ARMV4_H #define CRYPTOGAMS_SHA1_ARMV4_H #ifdef __cplusplus extern "C" { #endif extern unsigned int OPENSSL_armcap_P; void sha1_block_data_order(void *state, const void *data, size_t blocks); /* Auxval caps */ #ifndef HWCAP_ARM_NEON # define HWCAP_ARM_NEON 4096 #endif #ifndef HWCAP_SHA1 # define HWCAP_SHA1 (1 << 5) #endif /* OpenSSL caps */ #define ARMV7_NEON (1<<0) #define ARMV8_SHA1 (1<<3) #ifdef __cplusplus } #endif #endif /* CRYPTOGAMS_SHA1_ARMV4_H */
Test Program
The final step is to test the integration of Cryptogam's SHA with your program.
$ gcc -std=c99 sha1-armv4-test.c ./sha1-armv4.o -o sha1-armv4-test.exe $ ./sha1-armv4-test.exe SHA1 hash of empty message: DA39A3EE5E6B4B0D... Success!
And the test program is shown below.
#define _GNU_SOURCE #include <stdio.h> #include <stdint.h> #include <string.h> #include <sys/auxv.h> #include "sha1-armv4.h" /* processor caps */ unsigned int OPENSSL_armcap_P = 0; int main(int argc, char* argv[]) { /* processor caps */ if (getauxval(AT_HWCAP) & HWCAP_NEON) OPENSSL_armcap_P |= ARMV7_NEON; if (getauxval(AT_HWCAP) & HWCAP_SHA1) OPENSSL_armcap_P |= ARMV8_SHA1; /* empty message with padding */ uint8_t message[64]; memset(message, 0x00, sizeof(message)); message[0] = 0x80; /* initial state */ uint32_t state[5] = {0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0}; sha1_block_data_order(state, message, 1); const uint8_t b1 = (uint8_t)(state[0] >> 24); const uint8_t b2 = (uint8_t)(state[0] >> 16); const uint8_t b3 = (uint8_t)(state[0] >> 8); const uint8_t b4 = (uint8_t)(state[0] >> 0); const uint8_t b5 = (uint8_t)(state[1] >> 24); const uint8_t b6 = (uint8_t)(state[1] >> 16); const uint8_t b7 = (uint8_t)(state[1] >> 8); const uint8_t b8 = (uint8_t)(state[1] >> 0); /* DA39A3EE5E6B4B0D... */ printf("SHA1 hash of empty message: "); printf("%02X%02X%02X%02X%02X%02X%02X%02X...\n", b1, b2, b3, b4, b5, b6, b7, b8); int success = ((b1 == 0xDA) && (b2 == 0x39) && (b3 == 0xA3) && (b4 == 0xEE) && (b5 == 0x5E) && (b6 == 0x6B) && (b7 == 0x4B) && (b8 == 0x0D)); if (success) printf("Success!\n"); else printf("Failure!\n"); return (success != 0 ? 0 : 1); }
Benchmarks
You can perform a rough benchmark using the code shown below. Prior to executing the benchmark program you should move the CPU from on-demand or powersave to performance mode.
#define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <time.h> #include <unistd.h> #include <string.h> #include <sys/auxv.h> #include "sha1-armv4.h" typedef unsigned char byte; /* processor caps */ unsigned int OPENSSL_armcap_P = 0; int main(int argc, char* argv[]) { /* set processor caps */ if (getauxval(AT_HWCAP) & HWCAP_NEON) OPENSSL_armcap_P |= ARMV7_NEON; if (getauxval(AT_HWCAP) & HWCAP_SHA1) OPENSSL_armcap_P |= ARMV8_SHA1; const unsigned int STEPS = 128; byte* buf = (byte*)malloc(STEPS*64+64); memset(buf, 0x00, 16); double elapsed = 0.0; size_t total = 0; struct timespec start, end; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start); uint32_t state[5] = {0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0}; do { size_t idx = 0; for (unsigned int i=0; i<STEPS; ++i) sha1_block_data_order(state, buf, (idx+1)*64); total += 16*STEPS; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end); elapsed = (end.tv_sec-start.tv_sec); } while (elapsed < 3 /* seconds */); /* Increase precision of elapsed time */ elapsed = ((double)end.tv_sec-start.tv_sec) + ((double)end.tv_nsec-start.tv_nsec) / 1000 / 1000 / 1000; /* CPU freq of 1 GHz */ const double cpuFreq = 1000.0*1000*1000; const double bytes = total; const double ghz = cpuFreq / 1000 / 1000 / 1000; const double mbs = bytes / elapsed / 1024 / 1024; const double cpb = elapsed * cpuFreq / bytes; printf("%.0f bytes\n", bytes); printf("%.02f mbs\n", mbs); printf("%.02f cpb\n", cpb); free(buf); return 0; }
The results below are from a Libre Computer Tritium H3 with a Cortex-A7 Sun7i SoC running at 1 GHz. A C/C++ SHA implementation runs about 22 cpb on the dev-board. Notice sha1-armv4.S was compiled with -march=armv7.
$ gcc -std=c99 -march=armv7 -c sha1-armv4.S -o sha1-armv7.o $ gcc -O3 -std=c99 sha1-armv7-test.c sha1-armv7.o -o sha1-armv7-test.exe $ ./sha1-armv7-test.exe 180994048 bytes 57.59 mbs 16.56 cpb
Autotools
If you are using Autotools you can add the following to configure.ac and Makefile.am to conditionally compile sha1-armv4.S for A-32 platforms. You will need to detect ARM A-32 and set IS_ARM32 to non-0. Also see Automake Assembly Support in section 8.13 of the manual.
First, the configure.ac recipe:
# Set ASM tools AC_SUBST([CCAS], [$CC]) AC_SUBST([CCASFLAGS], [$CFLAGS]) # Used by Makefile.am to compile sha1-armv4.S if test "$IS_ARM32" != "0"; then ## Save CFLAGS SAVED_CFLAGS="$CFLAGS" CFLAGS="-march=armv7-a -Wa,--noexecstack" AC_MSG_CHECKING([if $CC supports $CFLAGS]) AC_COMPILE_IFELSE( [AC_LANG_PROGRAM([])], [AC_MSG_RESULT([yes]); AC_SUBST([tr_RESULT], [1])], [AC_MSG_RESULT([no]); AC_SUBST([tr_RESULT], [0])] ) if test "$tr_RESULT" = "1"; then AM_CONDITIONAL([CRYPTOGAMS_SHA1], [true]) AC_SUBST([CRYPTOPGAMS_FLAGS], [$CFLAGS]) else CFLAGS="-march=armv7-a" AC_MSG_CHECKING([if $CC supports $CFLAGS]) AC_COMPILE_IFELSE( [AC_LANG_PROGRAM([])], [AC_MSG_RESULT([yes]); AC_SUBST([tr_RESULT], [1])], [AC_MSG_RESULT([no]); AC_SUBST([tr_RESULT], [0])] ) if test "$tr_RESULT" = "1"; then AM_CONDITIONAL([CRYPTOGAMS_SHA1], [true]) AC_SUBST([CRYPTOPGAMS_FLAGS], [$CFLAGS]) else AM_CONDITIONAL([CRYPTOGAMS_SHA1], [false]) fi fi ## Restore CFLAGS CFLAGS="$SAVED_CFLAGS" else # Required for other platforms AM_CONDITIONAL([CRYPTOGAMS_SHA1], [false]) fi
Second, the Makefile.am recipe:
if CRYPTOGAMS_SHA1 sha_armv4_la_SOURCES = sha1-armv4.S sha_armv4_la_CCASFLAGS = $(AM_CFLAGS) $(CRYPTOGAMS_SHA1) pkginclude_HEADERS += sha1-armv4.h endif