Difference between revisions of "Cryptogams SHA"

From OpenSSLWiki
Jump to navigationJump to search
(Add info on symbol names)
m (Info on shared object; not Autotools.)
Line 7: Line 7:
 
A few cautions before you begin. First, you are going to examine undocumented features of the OpenSSL library to learn how to work with the Cryptogam's sources. The Cryptogam sources are stable but things could change over time. Second, the ARMv4 implementation hashes full SHA blocks. You are responsible for things like data alignment, padding and side channel counter-measures.
 
A few cautions before you begin. First, you are going to examine undocumented features of the OpenSSL library to learn how to work with the Cryptogam's sources. The Cryptogam sources are stable but things could change over time. Second, the ARMv4 implementation hashes full SHA blocks. You are responsible for things like data alignment, padding and side channel counter-measures.
  
If you experience ''"unexpected reloc type 0x03"'' on an Autotools project then see [https://sourceware.org/ml/binutils/2019-05/msg00287.html What does unexpected reloc type 0x03 mean?] on the Binutils mailing list.
+
If you experience ''"unexpected reloc type 0x03"'' whenbuilding a shared object then see [https://sourceware.org/ml/binutils/2019-05/msg00287.html What does unexpected reloc type 0x03 mean?] on the Binutils mailing list.
  
 
==Obtain Source Files==
 
==Obtain Source Files==

Revision as of 00:53, 24 May 2019

Cryptogams is Andy Polyakov's project used to develop high speed cryptographic primitives and share them with other developers. This wiki article will show you how to use Cryptogams ARMv4 SHA-1 implementation. According to the head notes the ARMv4 implementation runs around 6.5 cycles per byte (cpb). Typical C/C++ implementations run around 10 to 20 cpb and Andy's routines should outperform all of them.

Andy's Cryptogam implementations are provided by OpenSSL, but they are also available stand alone under a BSD license. The BSD style license is permissive and allows developers to use Andy's high speed cryptography without an OpenSSL dependency or licensing terms.

There are 6 steps to the process. The first step obtains the sources. The second step creates an ASM source file. The third step compiles and assembles the source file into an object file. The fourth steps determines the API. The fifth step creates a C header file. The final step integrates the object file into a program. Once you create the files sha1-armv4.h and sha1-armv4.S you can use sed to restore symbols back to their Cryptogams name with sed -i 's|OPENSSL|CRYPTOGAMS|g' sha1-armv4.h sha1-armv4.S.

A few cautions before you begin. First, you are going to examine undocumented features of the OpenSSL library to learn how to work with the Cryptogam's sources. The Cryptogam sources are stable but things could change over time. Second, the ARMv4 implementation hashes full SHA blocks. You are responsible for things like data alignment, padding and side channel counter-measures.

If you experience "unexpected reloc type 0x03" whenbuilding a shared object then see What does unexpected reloc type 0x03 mean? on the Binutils mailing list.

Obtain Source Files

There are two source files you need for Cryptogams SHA. The first is arm-xlate.pl and the second is sha1-armv4.pl. They are available in the OpenSSL sources. The following commands fetch OpenSSL and then peels off the two Cryptogams files of interest.

# Clone OpenSSL for the latest Cryptogams sources
git clone https://github.com/openssl/openssl.git

mkdir cryptogams/

cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/
cp ./openssl/crypto/sha/asm/sha1-armv4-large.pl ./cryptogams/
cp ./openssl/crypto/arm_arch.h cryptogams/

cd cryptogams/

Create ASM File

The second step is to run sha1-armv4-large.pl to produce an assembly language source file that can be consumed by GCC. sha1-armv4-large.pl internally calls arm-xlate.pl. linux32 is the flavor used by the translate program. sha1-armv4.S is the output filename. In the command below note the *.S file extension, which is a capitol S. Do not use a lowercase s because GCC must drive the compile and assemble step.

perl sha1-armv4-large.pl linux32 sha1-armv4.S

GCC is needed to drive the process because there are C macros in the source file. Some Cryptogam source files have this requirement, while some others do not. sha1-armv4 happens to have the requirement.

$ cat sha1-armv4.S
@ Copyright 2007-2018 The OpenSSL Project Authors. All Rights Reserved.
...

#ifndef __KERNEL__
# include "arm_arch.h"
#else
# define __ARM_ARCH__ __LINUX_ARM_ARCH__
#endif
...

At this point there is an ASM file but it needs two small fixups. First, arm_arch.h is an OpenSSL source file so the dependency must be removed. Second, GCC defines __ARM_ARCH instead of __ARM_ARCH__ so a sed is needed.

To fixup the source files execute the following two commands:

# Remove OpenSSL include
sed -i 's/# include "arm_arch.h"//g' sha1-armv4.S

# Fix GCC defines
sed -i 's/__ARM_ARCH__/__ARM_ARCH/g' sha1-armv4.S

Alternately, instead of the two sed's, you can open arm_arch.h, copy the defines and paste them directly into sha1-armv4.S. Take care when using arm_arch.h as it carries the OpenSSL license.

After the two fixups sha1-armv4.S is ready to be compiled by GCC.

Compile Source File

The source file is ready to be compiled and assembled. At this point there are two choices. First, you can use ARMv5t or higher which includes Thumb instructions. The following compiles the source file with ARMv5t.

$ gcc -march=armv5t -c sha1-armv4.S

The second choice uses ARMv4 and avoids Thumb instructions. If you want to avoid Thumb then add -marm to you compile command.

$ gcc -march=armv4 -marm -c sha1-armv4.S

Using ARMv5t as an example you now have an object file with the following symbols. Symbols with a capitol T are public and exported. Symbols with a lower t are private and should not be used.

$ gcc -march=armv4 -marm -c sha1-armv4.S
$ nm sha1-armv4.o
00000000 T sha1_block_data_order

And you can inspect the generated code with objdump.

$ objdump --disassemble sha1-armv4.o
sha1-armv4.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <sha1_block_data_order>:
   0:   e92d5ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr}
   4:   e0812302        add     r2, r1, r2, lsl #6
   8:   e89000f8        ldm     r0, {r3, r4, r5, r6, r7}
   c:   e59f858c        ldr     r8, [pc, #1420] ; 5a0 <sha1_block_data_order+0x5a0>
  10:   e1a0e00d        mov     lr, sp
  14:   e24dd03c        sub     sp, sp, #60     ; 0x3c
  18:   e1a05f65        ror     r5, r5, #30
  1c:   e1a06f66        ror     r6, r6, #30

...

Determine API

The next step is determine the API so you can call it from a C program. Unfortunately the API is not documented and you have to dig around the OpenSSL sources. Fortunately there is one function of interest called sha1_block_data_order.

A quick grep of OpenSSL sources reveals the following for sha1_block_data_order.

openssl$ grep -nIR sha1_block_data_order | grep '\.c'
crypto/evp/e_sha_cbc_hmac_sha1.c:95:    void sha1_block_data_order(void *c, const void *p, size_t len);
crypto/evp/e_sha_cbc_hmac_sha1.c:115:        sha1_block_data_order(c, ptr, len / SHA_CBLOCK);
crypto/evp/e_sha_cbc_hmac_sha1.c:615:        sha1_block_data_order(&key->md, data, 1);
crypto/evp/e_sha_cbc_hmac_sha1.c:631:        sha1_block_data_order(&key->md, data, 1);
...

We need several more symbols, and and they are OPENSSL_armcap_P, ARMV7_NEON and ARMV8_SHA1.

$ grep -nIR OPENSSL_armcap_P
...
crypto/armcap.c:20:unsigned int OPENSSL_armcap_P = 0;

Lather, rinse, repeat for ARMV7_NEON and ARMV8_SHA1.

Create C Header

The fifth step creates a C header file based on information from Determine API. The header file is needed for two reasons. First, it removes the OpenSSL dependency from your project. Second, it avoids OpenSSL licensing violations.

Below is the C Header file you can use. While it is not obvious, the len parameter from Determine API is a block count, not a byte count.

/* Header file for use with Cryptogam's ARMv4 SHA1.    */
/* Also see http://www.openssl.org/~appro/cryptogams/  */
/* https://wiki.openssl.org/index.php/Cryptogams_SHA.  */

#ifndef CRYPTOGAMS_SHA1_ARMV4_H
#define CRYPTOGAMS_SHA1_ARMV4_H

#ifdef __cplusplus
extern "C" {
#endif

extern unsigned int OPENSSL_armcap_P;
void sha1_block_data_order(void *state, const void *data, size_t blocks);

/* Auxval caps */
#ifndef HWCAP_NEON
# define HWCAP_NEON (1 << 12)
#endif
#ifndef HWCAP_SHA1
# define HWCAP_SHA1 (1 << 5)
#endif

/* OpenSSL caps */
#define ARMV7_NEON (1<<0)
#define ARMV8_SHA1 (1<<3)

#ifdef __cplusplus
}
#endif

#endif  /* CRYPTOGAMS_SHA1_ARMV4_H */

Test Program

The final step is to test the integration of Cryptogam's SHA with your program.

$ gcc -std=c99 sha1-armv4-test.c ./sha1-armv4.o -o sha1-armv4-test.exe
$ ./sha1-armv4-test.exe
SHA1 hash of empty message: DA39A3EE5E6B4B0D...
Success!

And the test program is shown below.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <sys/auxv.h>
#include "sha1-armv4.h"

/* processor caps */
unsigned int OPENSSL_armcap_P = 0;

int main(int argc, char* argv[])
{
    /* processor caps */
    if (getauxval(AT_HWCAP) & HWCAP_NEON)
        OPENSSL_armcap_P |= ARMV7_NEON;
    if (getauxval(AT_HWCAP) & HWCAP_SHA1)
        OPENSSL_armcap_P |= ARMV8_SHA1;

    /* empty message with padding */
    uint8_t message[64];
    memset(message, 0x00, sizeof(message));
    message[0] = 0x80;

    /* initial state */
    uint32_t state[5] = {0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0};

    sha1_block_data_order(state, message, 1);

    const uint8_t b1 = (uint8_t)(state[0] >> 24);
    const uint8_t b2 = (uint8_t)(state[0] >> 16);
    const uint8_t b3 = (uint8_t)(state[0] >>  8);
    const uint8_t b4 = (uint8_t)(state[0] >>  0);
    const uint8_t b5 = (uint8_t)(state[1] >> 24);
    const uint8_t b6 = (uint8_t)(state[1] >> 16);
    const uint8_t b7 = (uint8_t)(state[1] >>  8);
    const uint8_t b8 = (uint8_t)(state[1] >>  0);

    /* DA39A3EE5E6B4B0D... */
    printf("SHA1 hash of empty message: ");
    printf("%02X%02X%02X%02X%02X%02X%02X%02X...\n",
        b1, b2, b3, b4, b5, b6, b7, b8);

    int success = ((b1 == 0xDA) && (b2 == 0x39) && (b3 == 0xA3) && (b4 == 0xEE) &&
                    (b5 == 0x5E) && (b6 == 0x6B) && (b7 == 0x4B) && (b8 == 0x0D));

    if (success)
        printf("Success!\n");
    else
        printf("Failure!\n");

    return (success != 0 ? 0 : 1);
}


Symbol Names

The article used the same names as they appeared in the Cryptogams source code. For example, sha1_block_data_order is the names of function in the source code, and they will show up in the object file and when compiled and the library when linked.

It is possible the function name will collide if you also link to OpenSSL, either directly or indirectly. If you are concerned about symbol collisions and potential incompatibility bugs then rename the symbols using sed.

Benchmarks

You can perform a rough benchmark using the code shown below. Prior to executing the benchmark program you should move the CPU from on-demand or powersave to performance mode.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
#include <unistd.h>
#include <string.h>
#include <sys/auxv.h>
#include "sha1-armv4.h"

/* processor caps */
unsigned int OPENSSL_armcap_P = 0;

int main(int argc, char* argv[])
{
    /* set processor caps */
    if (getauxval(AT_HWCAP) & HWCAP_NEON)
        OPENSSL_armcap_P |= ARMV7_NEON;
    if (getauxval(AT_HWCAP) & HWCAP_SHA1)
        OPENSSL_armcap_P |= ARMV8_SHA1;

    const unsigned int STEPS = 128;
    uint8_t* buf = (uint8_t*)malloc(STEPS*64+64);
    memset(buf, 0x00, 16);

    double elapsed = 0.0;
    size_t total = 0;

    struct timespec start, end;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);

    uint32_t state[5] = {0x67452301, 0xEFCDAB89, 0x98BADCFE, 0x10325476, 0xC3D2E1F0};

    do
    {
        size_t idx = 0;
        for (unsigned int i=0; i<STEPS; ++i)
            sha1_block_data_order(state, buf, idx+1);
        total += 64*STEPS;
        
        clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
        elapsed = (end.tv_sec-start.tv_sec);
    }
    while (elapsed < 3 /* seconds */);

    /* Increase precision of elapsed time */
    elapsed = ((double)end.tv_sec-start.tv_sec) +
              ((double)end.tv_nsec-start.tv_nsec) / 1000 / 1000 / 1000;

    /* CPU freq of 1 GHz */
    const double cpuFreq = 1000.0*1000*1000;

    const double bytes = total;
    const double ghz = cpuFreq / 1000 / 1000 / 1000;
    const double mbs = bytes / elapsed / 1024 / 1024;
    const double cpb = elapsed * cpuFreq / bytes;
    
    printf("%.0f bytes\n", bytes);
    printf("%.02f mbs\n", mbs);
    printf("%.02f cpb\n", cpb);
    
    free(buf);
    
    return 0;
}

The results below are from a Libre Computer Tritium H3 with a Cortex-A7 Sun7i SoC running at 1 GHz. A C/C++ SHA implementation runs about 22 cpb on the dev-board. Notice sha1-armv4.S was compiled with -march=armv7.

$ gcc -std=c99 -march=armv7 -c sha1-armv4.S -o sha1-armv7.o
$ gcc -O3 -std=c99 sha1-armv7-test.c sha1-armv7.o -o sha1-armv7-test.exe
$ ./sha1-armv7-test.exe
180994048 bytes
57.59 mbs
16.56 cpb

Autotools

If you are using Autotools you can add the following to configure.ac and Makefile.am to conditionally compile sha1-armv4.S for A-32 platforms. You will need to detect ARM A-32 and set IS_ARM32 to non-0. Also see Automake Assembly Support in section 8.13 of the manual.

First, the configure.ac recipe:

# Set ASM tools
AC_SUBST([CCAS], [$CC])
AC_SUBST([CCASFLAGS], [$CFLAGS])

# Used by Makefile.am to compile sha1-armv4.S
if test "$IS_ARM32" != "0"; then

   ## Save CFLAGS
   SAVED_CFLAGS="$CFLAGS"

   CFLAGS="-march=armv7-a -Wa,--noexecstack"
   AC_MSG_CHECKING([if $CC supports $CFLAGS])
   AC_COMPILE_IFELSE(
      [AC_LANG_PROGRAM([])],
      [AC_MSG_RESULT([yes]); AC_SUBST([tr_RESULT], [1])],
      [AC_MSG_RESULT([no]);  AC_SUBST([tr_RESULT], [0])]
   )

   if test "$tr_RESULT" = "1"; then

      AM_CONDITIONAL([CRYPTOGAMS_SHA1], [true])
      AC_SUBST([CRYPTOPGAMS_FLAGS], [$CFLAGS])

   else

      CFLAGS="-march=armv7-a"
      AC_MSG_CHECKING([if $CC supports $CFLAGS])
      AC_COMPILE_IFELSE(
         [AC_LANG_PROGRAM([])],
         [AC_MSG_RESULT([yes]); AC_SUBST([tr_RESULT], [1])],
         [AC_MSG_RESULT([no]);  AC_SUBST([tr_RESULT], [0])]
      )

      if test "$tr_RESULT" = "1"; then
         AM_CONDITIONAL([CRYPTOGAMS_SHA1], [true])
         AC_SUBST([CRYPTOPGAMS_FLAGS], [$CFLAGS])
      else
         AM_CONDITIONAL([CRYPTOGAMS_SHA1], [false])
      fi
   fi

   ## Restore CFLAGS
   CFLAGS="$SAVED_CFLAGS"
else
   # Required for other platforms
   AM_CONDITIONAL([CRYPTOGAMS_SHA1], [false])
fi

Second, the Makefile.am recipe:

if CRYPTOGAMS_SHA1

  sha_armv4_la_SOURCES = sha1-armv4.S
  sha_armv4_la_CCASFLAGS = $(AM_CFLAGS) $(CRYPTOGAMS_SHA1)

  pkginclude_HEADERS += sha1-armv4.h

endif