Cryptogams SHA
Cryptogams is Andy Polyakov's project used to develop high speed cryptographic primitives and share them with other developers. This wiki article will show you how to use Cryptogams ARMv4 SHA-1 implementation. According to the head notes the ARMv4 implementation runs around 6.5 cycles per byte (cpb). Typical C/C++ implementations run around 10 to 20 cpb and Andy's routines should outperform all of them.
Andy's Cryptogam implementations are provided by OpenSSL, but they are also available stand alone under a BSD license. The BSD style license is permissive and allows developers to use Andy's high speed cryptography without an OpenSSL dependency or licensing terms.
There are 6 steps to the process. The first step obtains the sources. The second step creates an ASM source file. The third step compiles and assembles the source file into an object file. The fourth steps determines the API. The fifth step creates a C header file. The final step integrates the object file into a program. Once you create the files sha1-armv4.h and sha1-armv4.S you can use sed to restore symbols back to their Cryptogams name with sed -i 's|OPENSSL|CRYPTOGAMS|g' sha1-armv4.h sha1-armv4.S.
A few cautions before you begin. First, you are going to examine undocumented features of the OpenSSL library to learn how to work with the Cryptogam's sources. The Cryptogam sources are stable but things could change over time. Second, the ARMv4 implementation hashes full SHA blocks. You are responsible for things like padding and side channel counter-measures.
If you experience "unexpected reloc type 0x03" whenbuilding a shared object then see What does unexpected reloc type 0x03 mean? on the Binutils mailing list.
Obtain Source Files
There are two source files you need for Cryptogams SHA. The first is arm-xlate.pl and the second is sha1-armv4.pl. They are available in the OpenSSL sources. The following commands fetch OpenSSL and then peels off the two Cryptogams files of interest.
# Clone OpenSSL for the latest Cryptogams sources git clone https://github.com/openssl/openssl.git mkdir cryptogams/ cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/ cp ./openssl/crypto/sha/asm/sha1-armv4-large.pl ./cryptogams/ cp ./openssl/crypto/arm_arch.h cryptogams/ cd cryptogams/
Create ASM File
The second step is to run sha1-armv4-large.pl to produce an assembly language source file that can be consumed by GCC. sha1-armv4-large.pl internally calls arm-xlate.pl. linux32 is the flavor used by the translate program. sha1-armv4.S is the output filename. In the command below note the *.S file extension, which is a capitol S. Do not use a lowercase s because GCC must drive the compile and assemble step.
perl sha1-armv4-large.pl linux32 sha1-armv4.S
GCC is needed to drive the process because there are C macros in the source file. Some Cryptogam source files have this requirement, while some others do not. sha1-armv4 happens to have the requirement.
$ cat sha1-armv4.S @ Copyright 2007-2018 The OpenSSL Project Authors. All Rights Reserved. ... #ifndef __KERNEL__ # include "arm_arch.h" #else # define __ARM_ARCH__ __LINUX_ARM_ARCH__ #endif ...
At this point there is an ASM file but it needs two small fixups. First, arm_arch.h is an OpenSSL source file so the dependency must be removed. Second, GCC defines __ARM_ARCH instead of __ARM_ARCH__ so a sed is needed.
To fixup the source files execute the following two commands:
# Remove OpenSSL include sed -i 's/# include "arm_arch.h"//g' sha1-armv4.S # Fix GCC defines sed -i 's/__ARM_ARCH__/__ARM_ARCH/g' sha1-armv4.S
Alternately, instead of the two sed's, you can open arm_arch.h, copy the defines and paste them directly into sha1-armv4.S. Take care when using arm_arch.h as it carries the OpenSSL license.
After the two fixups sha1-armv4.S is ready to be compiled by GCC.
Compile Source File
The source file is ready to be compiled and assembled. At this point there are two choices. First, you can use ARMv5t or higher which includes Thumb instructions. The following compiles the source file with ARMv5t.
$ gcc -march=armv5t -c sha1-armv4.S
The second choice uses ARMv4 and avoids Thumb instructions. If you want to avoid Thumb then add -marm to you compile command.
$ gcc -march=armv4 -marm -c sha1-armv4.S
Using ARMv5t as an example you now have an object file with the following symbols. Symbols with a capitol T are public and exported. Symbols with a lower t are private and should not be used.
$ gcc -march=armv4 -marm -c sha1-armv4.S $ nm sha1-armv4.o 00000000 T sha1_block_data_order
And you can inspect the generated code with objdump.
$ objdump --disassemble sha1-armv4.o sha1-armv4.o: file format elf32-littlearm Disassembly of section .text: 00000000 <sha1_block_data_order>: 0: e92d5ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr} 4: e0812302 add r2, r1, r2, lsl #6 8: e89000f8 ldm r0, {r3, r4, r5, r6, r7} c: e59f858c ldr r8, [pc, #1420] ; 5a0 <sha1_block_data_order+0x5a0> 10: e1a0e00d mov lr, sp 14: e24dd03c sub sp, sp, #60 ; 0x3c 18: e1a05f65 ror r5, r5, #30 1c: e1a06f66 ror r6, r6, #30 ...
Determine API
The next step is determine the API so you can call it from a C program. Unfortunately the API is not documented and you have to dig around the OpenSSL sources. Fortunately there is one function of interest called sha1_block_data_order.
A quick grep of OpenSSL sources reveals the following for sha1_block_data_order.
openssl$ grep -nIR sha1_block_data_order | grep '\.c' crypto/evp/e_sha_cbc_hmac_sha1.c:95: void sha1_block_data_order(void *c, const void *p, size_t len); crypto/evp/e_sha_cbc_hmac_sha1.c:115: sha1_block_data_order(c, ptr, len / SHA_CBLOCK); crypto/evp/e_sha_cbc_hmac_sha1.c:615: sha1_block_data_order(&key->md, data, 1); crypto/evp/e_sha_cbc_hmac_sha1.c:631: sha1_block_data_order(&key->md, data, 1); ...
We need several more symbols, and and they are OPENSSL_armcap_P, ARMV7_NEON and ARMV8_SHA1.
$ grep -nIR OPENSSL_armcap_P ... crypto/armcap.c:20:unsigned int OPENSSL_armcap_P = 0;
Lather, rinse, repeat for ARMV7_NEON and ARMV8_SHA1.
Create C Header
The fifth step creates a C header file based on information from Determine API. The header file is needed for two reasons. First, it removes the OpenSSL dependency from your project. Second, it avoids OpenSSL licensing violations.
Below is the C Header file you can use. While it is not obvious, the len parameter from Determine API is a block count, not a byte count.
/* Header file for use with Cryptogam's ARMv4 SHA1. */ /* Also see http://www.openssl.org/~appro/cryptogams/ */ /* https://wiki.openssl.org/index.php/Cryptogams_SHA. */ #ifndef CRYPTOGAMS_SHA1_ARMV4_H #define CRYPTOGAMS_SHA1_ARMV4_H #ifdef __cplusplus extern "C" { #endif extern unsigned int OPENSSL_armcap_P; void sha1_block_data_order(void *state, const void *data, size_t blocks); /* Auxval caps */ #ifndef HWCAP_NEON # define HWCAP_NEON (1 << 12) #endif #ifndef HWCAP_SHA1 # define HWCAP_SHA1 (1 << 5) #endif /* OpenSSL caps */ #define ARMV7_NEON (1<<0) #define ARMV8_SHA1 (1<<3) #ifdef __cplusplus } #endif #endif /* CRYPTOGAMS_SHA1_ARMV4_H */
Test Program
The final step is to test the integration of Cryptogam's SHA with your program.
$ gcc -std=c99 sha1-armv4-test.c ./sha1-armv4.o -o sha1-armv4-test.exe $ ./sha1-armv4-test.exe SHA1 hash of empty message: DA39A3EE5E6B4B0D... Success!
And the test program is shown below.
#define _GNU_SOURCE #include <stdio.h> #include <stdint.h> #include <string.h> #include <sys/auxv.h> #include "sha1-armv4.h" /* processor caps */ unsigned int OPENSSL_armcap_P = 0; int main(int argc, char* argv[]) { /* processor caps */ if (getauxval(AT_HWCAP) & HWCAP_NEON) OPENSSL_armcap_P |= ARMV7_NEON; if (getauxval(AT_HWCAP) & HWCAP_SHA1) OPENSSL_armcap_P |= ARMV8_SHA1; /* empty message with padding */ uint8_t message[64]; memset(message, 0x00, sizeof(message)); message[0] = 0x80; /* initial state */ uint32_t state[5] = {0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0}; sha1_block_data_order(state, message, 1); const uint8_t b1 = (uint8_t)(state[0] >> 24); const uint8_t b2 = (uint8_t)(state[0] >> 16); const uint8_t b3 = (uint8_t)(state[0] >> 8); const uint8_t b4 = (uint8_t)(state[0] >> 0); const uint8_t b5 = (uint8_t)(state[1] >> 24); const uint8_t b6 = (uint8_t)(state[1] >> 16); const uint8_t b7 = (uint8_t)(state[1] >> 8); const uint8_t b8 = (uint8_t)(state[1] >> 0); /* DA39A3EE5E6B4B0D... */ printf("SHA1 hash of empty message: "); printf("%02X%02X%02X%02X%02X%02X%02X%02X...\n", b1, b2, b3, b4, b5, b6, b7, b8); int success = ((b1 == 0xDA) && (b2 == 0x39) && (b3 == 0xA3) && (b4 == 0xEE) && (b5 == 0x5E) && (b6 == 0x6B) && (b7 == 0x4B) && (b8 == 0x0D)); if (success) printf("Success!\n"); else printf("Failure!\n"); return (success != 0 ? 0 : 1); }
Symbol Names
The article used the same names as they appeared in the Cryptogams source code. For example, sha1_block_data_order is the names of function in the source code, and they will show up in the object file and when compiled and in the library when linked.
It is possible the function and date names will collide if you also link to OpenSSL, either directly or indirectly. If you plan on using Cryptogams code in a shared object then you should rename all symbols to avoid collisions. To rename symbols for SHA-1 you should rename sha1_block_data_order and OPENSSL_armcap_P. Assuming you are using MYLIB as a prefix the following sed should do the job.
sed -i 's/OPENSSL/MYLIB/g' sha1_armv4.h sha1_armv4.S sed -i 's/sha1_block_data_order/MYLIB_sha1_block_data_order/g' sha1_armv4.h sha1_armv4.S
You can verify public symbols were renamed with nm aes-armv4.o. Generally speaking, all symbols with capitol letters like T (public function), B (uninitialized data), C (common data), D (initialized data), and R (read-only data) should be renamed.
Benchmarks
You can perform a rough benchmark using the code shown below. Prior to executing the benchmark program you should move the CPU from on-demand or powersave to performance mode.
#define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <time.h> #include <unistd.h> #include <string.h> #include <sys/auxv.h> #include "sha1-armv4.h" /* processor caps */ unsigned int OPENSSL_armcap_P = 0; int main(int argc, char* argv[]) { /* set processor caps */ if (getauxval(AT_HWCAP) & HWCAP_NEON) OPENSSL_armcap_P |= ARMV7_NEON; if (getauxval(AT_HWCAP) & HWCAP_SHA1) OPENSSL_armcap_P |= ARMV8_SHA1; const unsigned int STEPS = 128; uint8_t* buf = (uint8_t*)malloc(STEPS*64+64); memset(buf, 0x00, 16); double elapsed = 0.0; size_t total = 0; struct timespec start, end; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start); uint32_t state[5] = {0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0}; do { size_t idx = 0; for (unsigned int i=0; i<STEPS; ++i) sha1_block_data_order(state, buf, idx+1); total += 64*STEPS; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end); elapsed = (end.tv_sec-start.tv_sec); } while (elapsed < 3 /* seconds */); /* Increase precision of elapsed time */ elapsed = ((double)end.tv_sec-start.tv_sec) + ((double)end.tv_nsec-start.tv_nsec) / 1000 / 1000 / 1000; /* CPU freq of 1 GHz */ const double cpuFreq = 1000.0*1000*1000; const double bytes = total; const double ghz = cpuFreq / 1000 / 1000 / 1000; const double mbs = bytes / elapsed / 1024 / 1024; const double cpb = elapsed * cpuFreq / bytes; printf("%.0f bytes\n", bytes); printf("%.02f mbs\n", mbs); printf("%.02f cpb\n", cpb); free(buf); return 0; }
The results below are from a Libre Computer Tritium H3 with a Cortex-A7 Sun7i SoC running at 1 GHz. A C/C++ SHA implementation runs about 22 cpb on the dev-board. Notice sha1-armv4.S was compiled with -march=armv7.
$ gcc -std=c99 -march=armv7 -c sha1-armv4.S -o sha1-armv7.o $ gcc -O3 -std=c99 sha1-armv7-test.c sha1-armv7.o -o sha1-armv7-test.exe $ ./sha1-armv7-test.exe 180994048 bytes 57.59 mbs 16.56 cpb
iOS Builds
sha1-armv4 can be configured for iOS. Simply use ios32 or ios64 instead of linux32 as shown below.
$ perl sha1-armv4-large.pl ios32 sha1-armv4.S $ clang -arch armv7 sha1-armv4.S -c
And then:
$ nm sha1-armv4.o 000012d0 s OPENSSL_armcap_P 00000004 C _OPENSSL_armcap_P 00000000 T _sha1_block_data_order 00001100 t sha1_block_data_order_armv8 00000560 t sha1_block_data_order_neon $ otool -tV sha1-armv4.o sha1-armv4.o: (__TEXT,__text) section _sha1_block_data_order: 00000000 f8dfc4ec ldr.w r12, [pc, #0x4ec] 00000004 f2af0308 subw r3, pc, #0x8 00000008 f853c00c ldr.w r12, [r3, r12] 0000000c f8dcc000 ldr.w r12, [r12] 00000010 f01c0f08 tst.w r12, #0x8 00000014 f0418074 bne.w sha1_block_data_order_armv8 00000018 f01c0f01 tst.w r12, #0x1 0000001c f04082a0 bne.w sha1_block_data_order_neon 00000020 e92d5ff0 push.w {r4, r5, r6, r7, r8, r9, r10, r11, r12, lr} ...