Cryptogams SHA

From OpenSSLWiki
Revision as of 09:22, 24 May 2019 by Jwalton (talk | contribs) (Mention ios64)
Jump to navigationJump to search

Cryptogams is Andy Polyakov's project used to develop high speed cryptographic primitives and share them with other developers. This wiki article will show you how to use Cryptogams ARMv4 SHA-1 implementation. According to the head notes the ARMv4 implementation runs around 6.5 cycles per byte (cpb). Typical C/C++ implementations run around 10 to 20 cpb and Andy's routines should outperform all of them.

Andy's Cryptogam implementations are provided by OpenSSL, but they are also available stand alone under a BSD license. The BSD style license is permissive and allows developers to use Andy's high speed cryptography without an OpenSSL dependency or licensing terms.

There are 6 steps to the process. The first step obtains the sources. The second step creates an ASM source file. The third step compiles and assembles the source file into an object file. The fourth steps determines the API. The fifth step creates a C header file. The final step integrates the object file into a program. Once you create the files sha1-armv4.h and sha1-armv4.S you can use sed to restore symbols back to their Cryptogams name with sed -i 's|OPENSSL|CRYPTOGAMS|g' sha1-armv4.h sha1-armv4.S.

A few cautions before you begin. First, you are going to examine undocumented features of the OpenSSL library to learn how to work with the Cryptogam's sources. The Cryptogam sources are stable but things could change over time. Second, the ARMv4 implementation hashes full SHA blocks. You are responsible for things like data alignment, padding and side channel counter-measures.

If you experience "unexpected reloc type 0x03" whenbuilding a shared object then see What does unexpected reloc type 0x03 mean? on the Binutils mailing list.

Obtain Source Files

There are two source files you need for Cryptogams SHA. The first is arm-xlate.pl and the second is sha1-armv4.pl. They are available in the OpenSSL sources. The following commands fetch OpenSSL and then peels off the two Cryptogams files of interest.

# Clone OpenSSL for the latest Cryptogams sources
git clone https://github.com/openssl/openssl.git

mkdir cryptogams/

cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/
cp ./openssl/crypto/sha/asm/sha1-armv4-large.pl ./cryptogams/
cp ./openssl/crypto/arm_arch.h cryptogams/

cd cryptogams/

Create ASM File

The second step is to run sha1-armv4-large.pl to produce an assembly language source file that can be consumed by GCC. sha1-armv4-large.pl internally calls arm-xlate.pl. linux32 is the flavor used by the translate program. sha1-armv4.S is the output filename. In the command below note the *.S file extension, which is a capitol S. Do not use a lowercase s because GCC must drive the compile and assemble step.

perl sha1-armv4-large.pl linux32 sha1-armv4.S

GCC is needed to drive the process because there are C macros in the source file. Some Cryptogam source files have this requirement, while some others do not. sha1-armv4 happens to have the requirement.

$ cat sha1-armv4.S
@ Copyright 2007-2018 The OpenSSL Project Authors. All Rights Reserved.
...

#ifndef __KERNEL__
# include "arm_arch.h"
#else
# define __ARM_ARCH__ __LINUX_ARM_ARCH__
#endif
...

At this point there is an ASM file but it needs two small fixups. First, arm_arch.h is an OpenSSL source file so the dependency must be removed. Second, GCC defines __ARM_ARCH instead of __ARM_ARCH__ so a sed is needed.

To fixup the source files execute the following two commands:

# Remove OpenSSL include
sed -i 's/# include "arm_arch.h"//g' sha1-armv4.S

# Fix GCC defines
sed -i 's/__ARM_ARCH__/__ARM_ARCH/g' sha1-armv4.S

Alternately, instead of the two sed's, you can open arm_arch.h, copy the defines and paste them directly into sha1-armv4.S. Take care when using arm_arch.h as it carries the OpenSSL license.

After the two fixups sha1-armv4.S is ready to be compiled by GCC.

Compile Source File

The source file is ready to be compiled and assembled. At this point there are two choices. First, you can use ARMv5t or higher which includes Thumb instructions. The following compiles the source file with ARMv5t.

$ gcc -march=armv5t -c sha1-armv4.S

The second choice uses ARMv4 and avoids Thumb instructions. If you want to avoid Thumb then add -marm to you compile command.

$ gcc -march=armv4 -marm -c sha1-armv4.S

Using ARMv5t as an example you now have an object file with the following symbols. Symbols with a capitol T are public and exported. Symbols with a lower t are private and should not be used.

$ gcc -march=armv4 -marm -c sha1-armv4.S
$ nm sha1-armv4.o
00000000 T sha1_block_data_order

And you can inspect the generated code with objdump.

$ objdump --disassemble sha1-armv4.o
sha1-armv4.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <sha1_block_data_order>:
   0:   e92d5ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
   4:   e0812302        add     r2, r1, r2, lsl #6
   8:   e89000f8        ldm     r0, {r3, r4, r5, r6, r7}
   c:   e59f858c        ldr     r8, [pc, #1420] ; 5a0 <sha1_block_data_order+0x5a0>
  10:   e1a0e00d        mov     lr, sp
  14:   e24dd03c        sub     sp, sp, #60     ; 0x3c
  18:   e1a05f65        ror     r5, r5, #30
  1c:   e1a06f66        ror     r6, r6, #30

...

Determine API

The next step is determine the API so you can call it from a C program. Unfortunately the API is not documented and you have to dig around the OpenSSL sources. Fortunately there is one function of interest called sha1_block_data_order.

A quick grep of OpenSSL sources reveals the following for sha1_block_data_order.

openssl$ grep -nIR sha1_block_data_order | grep '\.c'
crypto/evp/e_sha_cbc_hmac_sha1.c:95:    void sha1_block_data_order(void *c, const void *p, size_t len);
crypto/evp/e_sha_cbc_hmac_sha1.c:115:        sha1_block_data_order(c, ptr, len / SHA_CBLOCK);
crypto/evp/e_sha_cbc_hmac_sha1.c:615:        sha1_block_data_order(&key->md, data, 1);
crypto/evp/e_sha_cbc_hmac_sha1.c:631:        sha1_block_data_order(&key->md, data, 1);
...

We need several more symbols, and and they are OPENSSL_armcap_P, ARMV7_NEON and ARMV8_SHA1.

$ grep -nIR OPENSSL_armcap_P
...
crypto/armcap.c:20:unsigned int OPENSSL_armcap_P = 0;

Lather, rinse, repeat for ARMV7_NEON and ARMV8_SHA1.

Create C Header

The fifth step creates a C header file based on information from Determine API. The header file is needed for two reasons. First, it removes the OpenSSL dependency from your project. Second, it avoids OpenSSL licensing violations.

Below is the C Header file you can use. While it is not obvious, the len parameter from Determine API is a block count, not a byte count.

/* Header file for use with Cryptogam's ARMv4 SHA1.    */
/* Also see http://www.openssl.org/~appro/cryptogams/  */
/* https://wiki.openssl.org/index.php/Cryptogams_SHA.  */

#ifndef CRYPTOGAMS_SHA1_ARMV4_H
#define CRYPTOGAMS_SHA1_ARMV4_H

#ifdef __cplusplus
extern "C" {
#endif

extern unsigned int OPENSSL_armcap_P;
void sha1_block_data_order(void *state, const void *data, size_t blocks);

/* Auxval caps */
#ifndef HWCAP_NEON
# define HWCAP_NEON (1 << 12)
#endif
#ifndef HWCAP_SHA1
# define HWCAP_SHA1 (1 << 5)
#endif

/* OpenSSL caps */
#define ARMV7_NEON (1<<0)
#define ARMV8_SHA1 (1<<3)

#ifdef __cplusplus
}
#endif

#endif  /* CRYPTOGAMS_SHA1_ARMV4_H */

Test Program

The final step is to test the integration of Cryptogam's SHA with your program.

$ gcc -std=c99 sha1-armv4-test.c ./sha1-armv4.o -o sha1-armv4-test.exe
$ ./sha1-armv4-test.exe
SHA1 hash of empty message: DA39A3EE5E6B4B0D...
Success!

And the test program is shown below.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <sys/auxv.h>
#include "sha1-armv4.h"

/* processor caps */
unsigned int OPENSSL_armcap_P = 0;

int main(int argc, char* argv[])
{
    /* processor caps */
    if (getauxval(AT_HWCAP) & HWCAP_NEON)
        OPENSSL_armcap_P |= ARMV7_NEON;
    if (getauxval(AT_HWCAP) & HWCAP_SHA1)
        OPENSSL_armcap_P |= ARMV8_SHA1;

    /* empty message with padding */
    uint8_t message[64];
    memset(message, 0x00, sizeof(message));
    message[0] = 0x80;

    /* initial state */
    uint32_t state[5] = {0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0};

    sha1_block_data_order(state, message, 1);

    const uint8_t b1 = (uint8_t)(state[0] >> 24);
    const uint8_t b2 = (uint8_t)(state[0] >> 16);
    const uint8_t b3 = (uint8_t)(state[0] >>  8);
    const uint8_t b4 = (uint8_t)(state[0] >>  0);
    const uint8_t b5 = (uint8_t)(state[1] >> 24);
    const uint8_t b6 = (uint8_t)(state[1] >> 16);
    const uint8_t b7 = (uint8_t)(state[1] >>  8);
    const uint8_t b8 = (uint8_t)(state[1] >>  0);

    /* DA39A3EE5E6B4B0D... */
    printf("SHA1 hash of empty message: ");
    printf("%02X%02X%02X%02X%02X%02X%02X%02X...\n",
        b1, b2, b3, b4, b5, b6, b7, b8);

    int success = ((b1 == 0xDA) && (b2 == 0x39) && (b3 == 0xA3) && (b4 == 0xEE) &&
                    (b5 == 0x5E) && (b6 == 0x6B) && (b7 == 0x4B) && (b8 == 0x0D));

    if (success)
        printf("Success!\n");
    else
        printf("Failure!\n");

    return (success != 0 ? 0 : 1);
}


Symbol Names

The article used the same names as they appeared in the Cryptogams source code. For example, sha1_block_data_order is the names of function in the source code, and they will show up in the object file and when compiled and in the library when linked.

It is possible the function and date names will collide if you also link to OpenSSL, either directly or indirectly. If you plan on using Cryptogams code in a shared object then you should rename all symbols to avoid collisions. To rename symbols for SHA-1 you should rename sha1_block_data_order and OPENSSL_armcap_P. Assuming you are using MYLIB as a prefix the following sed should do the job.

sed -i 's/OPENSSL/MYLIB/g' sha1_armv4.h sha1_armv4.S
sed -i 's/sha1_block_data_order/MYLIB_sha1_block_data_order/g' sha1_armv4.h sha1_armv4.S

You can verify public symbols were renamed with nm aes-armv4.o. Generally speaking, all symbols with capitol letters like T (public function), B (uninitialized data), C (common data), D (initialized data), and R (read-only data) should be renamed.

Benchmarks

You can perform a rough benchmark using the code shown below. Prior to executing the benchmark program you should move the CPU from on-demand or powersave to performance mode.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
#include <unistd.h>
#include <string.h>
#include <sys/auxv.h>
#include "sha1-armv4.h"

/* processor caps */
unsigned int OPENSSL_armcap_P = 0;

int main(int argc, char* argv[])
{
    /* set processor caps */
    if (getauxval(AT_HWCAP) & HWCAP_NEON)
        OPENSSL_armcap_P |= ARMV7_NEON;
    if (getauxval(AT_HWCAP) & HWCAP_SHA1)
        OPENSSL_armcap_P |= ARMV8_SHA1;

    const unsigned int STEPS = 128;
    uint8_t* buf = (uint8_t*)malloc(STEPS*64+64);
    memset(buf, 0x00, 16);

    double elapsed = 0.0;
    size_t total = 0;

    struct timespec start, end;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);

    uint32_t state[5] = {0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0};

    do
    {
        size_t idx = 0;
        for (unsigned int i=0; i<STEPS; ++i)
            sha1_block_data_order(state, buf, idx+1);
        total += 64*STEPS;
        
        clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
        elapsed = (end.tv_sec-start.tv_sec);
    }
    while (elapsed < 3 /* seconds */);

    /* Increase precision of elapsed time */
    elapsed = ((double)end.tv_sec-start.tv_sec) +
              ((double)end.tv_nsec-start.tv_nsec) / 1000 / 1000 / 1000;

    /* CPU freq of 1 GHz */
    const double cpuFreq = 1000.0*1000*1000;

    const double bytes = total;
    const double ghz = cpuFreq / 1000 / 1000 / 1000;
    const double mbs = bytes / elapsed / 1024 / 1024;
    const double cpb = elapsed * cpuFreq / bytes;
    
    printf("%.0f bytes\n", bytes);
    printf("%.02f mbs\n", mbs);
    printf("%.02f cpb\n", cpb);
    
    free(buf);
    
    return 0;
}

The results below are from a Libre Computer Tritium H3 with a Cortex-A7 Sun7i SoC running at 1 GHz. A C/C++ SHA implementation runs about 22 cpb on the dev-board. Notice sha1-armv4.S was compiled with -march=armv7.

$ gcc -std=c99 -march=armv7 -c sha1-armv4.S -o sha1-armv7.o
$ gcc -O3 -std=c99 sha1-armv7-test.c sha1-armv7.o -o sha1-armv7-test.exe
$ ./sha1-armv7-test.exe
180994048 bytes
57.59 mbs
16.56 cpb

iOS Builds

sha1-armv4 can be configured for iOS. Simply use ios32 or ios64 instead of linux32 as shown below.

$ perl sha1-armv4-large.pl ios32 sha1-armv4.S
$ clang -arch armv7 sha1-armv4.S -c

And then:

$ nm sha1-armv4.o
000012d0 s OPENSSL_armcap_P
00000004 C _OPENSSL_armcap_P
00000000 T _sha1_block_data_order
00001100 t sha1_block_data_order_armv8
00000560 t sha1_block_data_order_neon

$ otool -tV sha1-armv4.o
sha1-armv4.o:
(__TEXT,__text) section
_sha1_block_data_order:
00000000        f8dfc4ec        ldr.w   r12, [pc, #0x4ec]
00000004        f2af0308        subw    r3, pc, #0x8
00000008        f853c00c        ldr.w   r12, [r3, r12]
0000000c        f8dcc000        ldr.w   r12, [r12]
00000010        f01c0f08        tst.w   r12, #0x8
00000014        f0418074        bne.w   sha1_block_data_order_armv8
00000018        f01c0f01        tst.w   r12, #0x1
0000001c        f04082a0        bne.w   sha1_block_data_order_neon
00000020        e92d5ff0        push.w  {r4, r5, r6, r7, r8, r9, r10, r11, r12, lr}
...