Difference between revisions of "Cryptogams AES"

From OpenSSLWiki
Jump to navigationJump to search
m
m (Add info on Apple M1 builds.)
 
(40 intermediate revisions by the same user not shown)
Line 1: Line 1:
[http://www.openssl.org/~appro/cryptogams/ Cryptogams] is Andy Polyakov's project used to develop high speed cryptographic primitives and share them with other developers. This wiki article will show you how to use Cryptogams ARMv4 AES implementation. The ARMv4 implementation runs around 20 to 25 cycles per byte (cpb). Typical C/C++ implementations run around 40 to 80 cpb and Andy's hand tuned ASM should outperform all of them.
+
[http://www.openssl.org/~appro/cryptogams/ Cryptogams] is Andy Polyakov's project used to develop high speed cryptographic primitives and share them with other developers. This wiki article will show you how to use Cryptogams ARMv4 AES implementation. According to the head notes the ARMv4 implementation runs around 22 to 40 cycles per byte (cpb). Typical C/C++ implementations run around 50 to 80 cpb and Andy's routines should outperform all of them.
  
Andy's Cryptogam implementations are provided by OpenSSL, but they are also available stand alone under a BSD license. The BSD style license is permissive and allows more developers to use Andy's high speed cryptography without an OpenSSL dependency.
+
Andy's Cryptogam implementations are provided by OpenSSL, but they are also available stand alone under a BSD license. The BSD style license is permissive and allows developers to use Andy's high speed cryptography without an OpenSSL dependency or licensing terms.
  
There are 6 steps to the process. The first step obtains the sources. The second step creates an ASM source file. The third step compiles and assembles the source file into an object file. The fourth steps determines the API. The fifth step creates a C header file. The final step integrates the object file into a program.
+
There are 6 steps to the process. The first step obtains the sources. The second step creates an ASM source file. The third step compiles and assembles the source file into an object file. The fourth steps determines the API. The fifth step creates a C header file. The final step integrates the object file into a program. Once you create the files <tt>aes-armv4.h</tt> and <tt>aes-armv4.S</tt> you can use <tt>sed</tt> to restore symbols back to their Cryptogams name with <tt>sed -i 's|OPENSSL|CRYPTOGAMS|g' aes-armv4.h aes-armv4.S</tt>.
  
 
A few cautions before you begin. First, you are going to examine undocumented features of the OpenSSL library to learn how to work with the Cryptogam's sources. The Cryptogam sources are stable but things could change over time. Second, the ARMv4 implementation operates in ECB mode and encrypts or decrypts full AES blocks. You are responsible for things like padding and side channel counter-measures.
 
A few cautions before you begin. First, you are going to examine undocumented features of the OpenSSL library to learn how to work with the Cryptogam's sources. The Cryptogam sources are stable but things could change over time. Second, the ARMv4 implementation operates in ECB mode and encrypts or decrypts full AES blocks. You are responsible for things like padding and side channel counter-measures.
 +
 +
At the moment Clang is miscompiling <tt>aes-armv4.S</tt>. Also see [https://bugs.llvm.org/show_bug.cgi?id=38133 LLVM Issue 38133]. You can work around the problem by compiling <tt>aes-armv4.S</tt> with <tt>-mthumb</tt>, but all data must be aligned. If you don't use aligned buffers then a <tt>SIGBUS</tt> could occur.
  
 
==Obtain Source Files==
 
==Obtain Source Files==
Line 18: Line 20:
 
cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/
 
cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/
 
cp ./openssl/crypto/aes/asm/aes-armv4.pl ./cryptogams/
 
cp ./openssl/crypto/aes/asm/aes-armv4.pl ./cryptogams/
 +
cp ./openssl/crypto/arm_arch.h cryptogams/
  
cd cryptogams/
+
cd cryptogams/</pre>
chmod +x *.pl</pre>
 
  
 
==Create ASM File==
 
==Create ASM File==
Line 26: Line 28:
 
The second step is to run <tt>aes-armv4.pl</tt> to produce an assembly language source file that can be consumed by GCC. <tt>aes-armv4.pl</tt> internally calls <tt>arm-xlate.pl</tt>. <tt>linux32</tt> is the flavor used by the translate program. <tt>aes-armv4.S</tt> is the output filename. In the command below note the <tt>*.S</tt> file extension, which is a capitol '''''S'''''. Do not use a lowercase '''''s''''' because GCC must drive the compile and assemble step.
 
The second step is to run <tt>aes-armv4.pl</tt> to produce an assembly language source file that can be consumed by GCC. <tt>aes-armv4.pl</tt> internally calls <tt>arm-xlate.pl</tt>. <tt>linux32</tt> is the flavor used by the translate program. <tt>aes-armv4.S</tt> is the output filename. In the command below note the <tt>*.S</tt> file extension, which is a capitol '''''S'''''. Do not use a lowercase '''''s''''' because GCC must drive the compile and assemble step.
  
<pre>./aes-armv4.pl linux32 aes-armv4.S</pre>
+
<pre>perl aes-armv4.pl linux32 aes-armv4.S</pre>
  
GCC is needed to drive the process because there are C macros in the source file:
+
GCC is needed to drive the process because there are C macros in the source file. Some Cryptogam source files have this requirement, while some others do not. <tt>aes-armv4</tt> happens to have the requirement.
  
 
<pre>$ cat aes-armv4.S
 
<pre>$ cat aes-armv4.S
Line 43: Line 45:
 
At this point there is an ASM file but it needs two small fixups. First, <tt>arm_arch.h</tt> is an OpenSSL source file so the dependency must be removed. Second, GCC defines <tt>__ARM_ARCH</tt> instead of <tt>__ARM_ARCH__</tt> so a <tt>sed</tt> is needed.
 
At this point there is an ASM file but it needs two small fixups. First, <tt>arm_arch.h</tt> is an OpenSSL source file so the dependency must be removed. Second, GCC defines <tt>__ARM_ARCH</tt> instead of <tt>__ARM_ARCH__</tt> so a <tt>sed</tt> is needed.
  
To fixup the source files executes the following two commands:
+
To fixup the source files execute the following two commands:
  
 
<pre># Remove OpenSSL include
 
<pre># Remove OpenSSL include
Line 50: Line 52:
 
# Fix GCC defines
 
# Fix GCC defines
 
sed -i 's/__ARM_ARCH__/__ARM_ARCH/g' aes-armv4.S</pre>
 
sed -i 's/__ARM_ARCH__/__ARM_ARCH/g' aes-armv4.S</pre>
 +
 +
Alternately, instead of the two <tt>sed's</tt>, you can open <tt>arm_arch.h</tt>, copy the defines and paste them directly into <tt>aes-armv4.S</tt>. Take care when using <tt>arm_arch.h</tt> as it carries the OpenSSL license.
  
 
After the two fixups <tt>aes-armv4.S</tt> is ready to be compiled by GCC.
 
After the two fixups <tt>aes-armv4.S</tt> is ready to be compiled by GCC.
Line 63: Line 67:
 
<pre>$ gcc -march=armv4 -marm -c aes-armv4.S</pre>
 
<pre>$ gcc -march=armv4 -marm -c aes-armv4.S</pre>
  
Using ARMv5t as an example we now have an object file with the following symbols. Symbols with a capitol '''''T''''' are public and exported. Symbols with a lower '''''t''''' are private and should not be used.
+
Using ARMv5t as an example you now have an object file with the following symbols. Symbols with a capitol '''''T''''' are public and exported. Symbols with a lower '''''t''''' are private and should not be used.
  
 
<pre>$ gcc -march=armv5t -c aes-armv4.S
 
<pre>$ gcc -march=armv5t -c aes-armv4.S
Line 118: Line 122:
 
The next step is determine the API so you can call it from a C program. Unfortunately the API is not documented and you have to dig around the OpenSSL sources. The functions of interest are <tt>AES_set_encrypt_key</tt>, <tt>AES_set_decrypt_key</tt>, <tt>AES_encrypt</tt> and <tt>AES_decrypt</tt>.
 
The next step is determine the API so you can call it from a C program. Unfortunately the API is not documented and you have to dig around the OpenSSL sources. The functions of interest are <tt>AES_set_encrypt_key</tt>, <tt>AES_set_decrypt_key</tt>, <tt>AES_encrypt</tt> and <tt>AES_decrypt</tt>.
  
A quick grep of OpenSSL sources reveals the following for <tt>AES_set_encrypt_key</tt>.
+
A quick <tt>grep</tt> of OpenSSL sources reveals the following for <tt>AES_set_encrypt_key</tt>.
  
 
<pre>openssl$ grep -nIR AES_set_encrypt_key | grep '\.c'
 
<pre>openssl$ grep -nIR AES_set_encrypt_key | grep '\.c'
Line 135: Line 139:
 
   729  }</pre>
 
   729  }</pre>
  
The next piece of information to discover is <tt>AES_KEY</tt>. Again a quick grep leads you to <tt>aes_key_st</tt>.
+
The next piece of information to discover is <tt>AES_KEY</tt>. Again a quick <tt>grep</tt> leads you to <tt>aes_key_st</tt>.
  
 
<pre>openssl$ grep -nIR AES_KEY | grep typedef
 
<pre>openssl$ grep -nIR AES_KEY | grep typedef
Line 153: Line 157:
 
     39  typedef struct aes_key_st AES_KEY;</pre>
 
     39  typedef struct aes_key_st AES_KEY;</pre>
  
Finally, we need <tt>AES_MAXNR</tt> from <tt>aes.h</tt>.
+
Finally, you need <tt>AES_MAXNR</tt> from <tt>aes.h</tt>.
  
 
<pre>openssl$ grep -IR AES_MAXNR | grep define
 
<pre>openssl$ grep -IR AES_MAXNR | grep define
Line 204: Line 208:
 
#ifndef CRYPTOGAMS_AES_ARMV4_H
 
#ifndef CRYPTOGAMS_AES_ARMV4_H
 
#define CRYPTOGAMS_AES_ARMV4_H
 
#define CRYPTOGAMS_AES_ARMV4_H
 +
 +
#ifdef __cplusplus
 +
extern "C" {
 +
#endif
  
 
#define AES_MAXNR 14
 
#define AES_MAXNR 14
Line 216: Line 224:
 
void AES_encrypt(const unsigned char *in, unsigned char *out, const AES_KEY *key);
 
void AES_encrypt(const unsigned char *in, unsigned char *out, const AES_KEY *key);
 
void AES_decrypt(const unsigned char *in, unsigned char *out, const AES_KEY *key);
 
void AES_decrypt(const unsigned char *in, unsigned char *out, const AES_KEY *key);
 +
 +
#ifdef __cplusplus
 +
}
 +
#endif
  
 
#endif  /* CRYPTOGAMS_AES_ARMV4_H */</pre>
 
#endif  /* CRYPTOGAMS_AES_ARMV4_H */</pre>
Line 221: Line 233:
 
==Test Program==
 
==Test Program==
  
The final step is to test the integration of Cryptogam's AES with your program. The program below was executed on a BananaPi with a Cortex-A7 running at 950 MHz.
+
The final step is to test the integration of Cryptogam's AES with your program.
  
 
<pre>$ gcc -std=c99 aes-armv4-test.c ./aes-armv4.o -o aes-armv4-test.exe
 
<pre>$ gcc -std=c99 aes-armv4-test.c ./aes-armv4.o -o aes-armv4-test.exe
Line 230: Line 242:
 
And the test program is shown below.
 
And the test program is shown below.
  
<pre>#include <stdio.h>
+
<pre>#define _GNU_SOURCE
 +
#include <stdio.h>
 +
#include <stdint.h>
 
#include <string.h>
 
#include <string.h>
#include <assert.h>
 
 
#include "aes-armv4.h"
 
#include "aes-armv4.h"
 
typedef unsigned char byte;
 
  
 
int main(int argc, char* argv[])
 
int main(int argc, char* argv[])
 
{
 
{
 
     /* Test key from FIPS 197 */
 
     /* Test key from FIPS 197 */
     const byte kb[] = {
+
     const uint8_t kb[] = {
 
         0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6,
 
         0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6,
 
         0xab, 0xf7, 0x15, 0x88, 0x09, 0xcf, 0x4f, 0x3c
 
         0xab, 0xf7, 0x15, 0x88, 0x09, 0xcf, 0x4f, 0x3c
 
     };
 
     };
  
     const byte pb[] = {
+
     const uint8_t pb[] = {
 
         0x6b, 0xc1, 0xbe, 0xe2, 0x2e, 0x40, 0x9f, 0x96,
 
         0x6b, 0xc1, 0xbe, 0xe2, 0x2e, 0x40, 0x9f, 0x96,
 
         0xe9, 0x3d, 0x7e, 0x11, 0x73, 0x93, 0x17, 0x2a
 
         0xe9, 0x3d, 0x7e, 0x11, 0x73, 0x93, 0x17, 0x2a
 
     };
 
     };
  
     const byte cb[] = {
+
     const uint8_t cb[] = {
 
         0x3a, 0xd7, 0x7b, 0xb4, 0x0d, 0x7a, 0x36, 0x60,  
 
         0x3a, 0xd7, 0x7b, 0xb4, 0x0d, 0x7a, 0x36, 0x60,  
 
         0xa8, 0x9e, 0xca, 0xf3, 0x24, 0x66, 0xef, 0x97
 
         0xa8, 0x9e, 0xca, 0xf3, 0x24, 0x66, 0xef, 0x97
Line 256: Line 267:
  
 
     /* Scratch */
 
     /* Scratch */
     byte buf[16];
+
     uint8_t buf[16];
 
     int result;
 
     int result;
 
      
 
      
Line 285: Line 296:
 
     return 0;
 
     return 0;
 
}</pre>
 
}</pre>
 +
 +
==Symbol Names==
 +
 +
The article used the same names as they appeared in the Cryptogams source code. For example, <tt>AES_set_encrypt_key</tt> and <tt>AES_set_decrypt_key</tt> are the names of two functions in the source code, and they will show up in the object file and when compiled and the library when linked.
 +
 +
It is possible the function and date names will collide if you also link to OpenSSL, either directly or indirectly. If you plan on using Cryptogams code in a shared object then you should rename all symbols to avoid collisions. To rename symbols for AES you should rename all <tt>AES_*</tt> names. Assuming you are using <tt>MYLIB</tt> as a prefix the following <tt>sed</tt> should do the job.
 +
 +
<pre>sed -i 's/OPENSSL/MYLIB/g' aes_armv4.h aes_armv4.S
 +
sed -i 's/AES_decrypt/MYLIB_AES_decrypt/g' aes_armv4.h aes_armv4.S
 +
sed -i 's/AES_encrypt/MYLIB_AES_encrypt/g' aes_armv4.h aes_armv4.S
 +
sed -i 's/AES_set_decrypt_key/MYLIB_AES_set_decrypt_key/g' aes_armv4.h aes_armv4.S
 +
sed -i 's/AES_set_enc2dec_key/MYLIB_AES_set_enc2dec_key/g' aes_armv4.h aes_armv4.S
 +
sed -i 's/AES_set_encrypt_key/MYLIB_AES_set_encrypt_key/g' aes_armv4.h aes_armv4.S</pre>
 +
 +
You can verify public symbols were renamed with <tt>nm aes-armv4.o</tt>. Generally speaking, all symbols with capitol letters like <tt>T</tt> (public function), <tt>B</tt> (uninitialized data), <tt>C</tt> (common data), <tt>D</tt> (initialized data), and <tt>R</tt> (read-only data) should be renamed.
 +
 +
==Benchmarks==
 +
 +
You can perform a rough benchmark using the code shown below. Prior to executing the benchmark program you should move the CPU from <tt>on-demand</tt> or <tt>powersave</tt> to <tt>performance</tt> mode.
 +
 +
<pre>#define _GNU_SOURCE
 +
#include <stdio.h>
 +
#include <stdlib.h>
 +
#include <time.h>
 +
#include <unistd.h>
 +
#include <string.h>
 +
#include "aes-armv4.h"
 +
 +
typedef unsigned char byte;
 +
 +
int main(int argc, char* argv[])
 +
{
 +
    const unsigned int STEPS = 128;
 +
    byte* buf = (byte*)malloc(STEPS*16+16);
 +
    memset(buf, 0x00, 16);
 +
 +
    AES_KEY ekey;
 +
    (void)AES_set_encrypt_key(buf, 16*8, &ekey);
 +
 +
    double elapsed = 0.0;
 +
    size_t total = 0;
 +
 +
    struct timespec start, end;
 +
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
 +
 +
    do
 +
    {
 +
        size_t idx = 0;
 +
        for (unsigned int i=0; i<STEPS; ++i)
 +
            AES_encrypt(&buf[idx+i], &buf[idx+i+1], &ekey);
 +
        total += 16*STEPS;
 +
       
 +
        clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
 +
        elapsed = (end.tv_sec-start.tv_sec);
 +
    }
 +
    while (elapsed < 3 /* seconds */);
 +
 +
    /* Increase precision of elapsed time */
 +
    elapsed = ((double)end.tv_sec-start.tv_sec) +
 +
              ((double)end.tv_nsec-start.tv_nsec) / 1000 / 1000 / 1000;
 +
 +
    /* CPU freq of 950 MHz */
 +
    const double cpuFreq = 950.0*1000*1000;
 +
 +
    const double bytes = total;
 +
    const double ghz = cpuFreq / 1000 / 1000 / 1000;
 +
    const double mbs = bytes / elapsed / 1024 / 1024;
 +
    const double cpb = elapsed * cpuFreq / bytes;
 +
   
 +
    printf("%.0f bytes\n", bytes);
 +
    printf("%.02f mbs\n", mbs);
 +
    printf("%.02f cpb\n", cpb);
 +
   
 +
    free(buf);
 +
   
 +
    return 0;
 +
}</pre>
 +
 +
The results below are from a BananaPi with a Cortex-A7 Sun7i SoC running at 950 MHz. A C/C++ AES implementation runs about 65 cpb on the dev-board. Notice <tt>aes-armv4.S</tt> was compiled with <tt>-march=armv7</tt>.
 +
 +
<pre>$ gcc -march=armv7 -c aes-armv4.S -o aes-armv7.o
 +
$ gcc -O3 -std=c99 aes-armv7-test.c aes-armv7.o -o aes-armv7-test.exe
 +
$ ./aes-armv7-test.exe
 +
78426112 bytes
 +
24.93 mbs
 +
36.34 cpb</pre>
 +
 +
And the following is from a Wandboard Dual with a NXP i.MX6 Cortex-A9 running at 1 GHz.  A C/C++ implementation runs around 40 cpb.
 +
 +
<pre>$ ./aes-armv7-test.exe
 +
106029056 bytes
 +
33.80 mbs
 +
26.80 cpb</pre>
 +
 +
== Autotools ==
 +
 +
If you are using Autotools you can add the following to <tt>configure.ac</tt> and <tt>Makefile.am</tt> to conditionally compile <tt>aes-armv4.S</tt> for A-32 platforms. You will need to detect ARM A-32 and set <tt>IS_ARM32</tt> to non-0. Also see [https://www.gnu.org/software/automake/manual/html_node/Assembly-Support.html Automake Assembly Support] in section 8.13 of the manual.
 +
 +
First, the <tt>configure.ac</tt> recipe:
 +
 +
<pre># Set ASM tools
 +
AC_SUBST([CCAS], [$CC])
 +
AC_SUBST([CCASFLAGS], [$CFLAGS])
 +
 +
# Used by Makefile.am to compile aes-armv4.S
 +
if test "$IS_ARM32" != "0"; then
 +
 +
  ## Save CFLAGS
 +
  SAVED_CFLAGS="$CFLAGS"
 +
 +
  CFLAGS="-march=armv7-a -Wa,--noexecstack"
 +
  AC_MSG_CHECKING([if $CC supports $CFLAGS])
 +
  AC_COMPILE_IFELSE(
 +
      [AC_LANG_PROGRAM([])],
 +
      [AC_MSG_RESULT([yes]); AC_SUBST([tr_RESULT], [1])],
 +
      [AC_MSG_RESULT([no]);  AC_SUBST([tr_RESULT], [0])]
 +
  )
 +
 +
  if test "$tr_RESULT" = "1"; then
 +
 +
      AM_CONDITIONAL([CRYPTOGAMS_AES], [true])
 +
      AC_SUBST([CRYPTOPGAMS_FLAGS], [$CFLAGS])
 +
 +
  else
 +
 +
      CFLAGS="-march=armv7-a"
 +
      AC_MSG_CHECKING([if $CC supports $CFLAGS])
 +
      AC_COMPILE_IFELSE(
 +
        [AC_LANG_PROGRAM([])],
 +
        [AC_MSG_RESULT([yes]); AC_SUBST([tr_RESULT], [1])],
 +
        [AC_MSG_RESULT([no]);  AC_SUBST([tr_RESULT], [0])]
 +
      )
 +
 +
      if test "$tr_RESULT" = "1"; then
 +
        AM_CONDITIONAL([CRYPTOGAMS_AES], [true])
 +
        AC_SUBST([CRYPTOPGAMS_FLAGS], [$CFLAGS])
 +
      else
 +
        AM_CONDITIONAL([CRYPTOGAMS_AES], [false])
 +
      fi
 +
  fi
 +
 +
  ## Restore CFLAGS
 +
  CFLAGS="$SAVED_CFLAGS"
 +
else
 +
  # Required for other platforms
 +
  AM_CONDITIONAL([CRYPTOGAMS_AES], [false])
 +
fi</pre>
 +
 +
Second, the <tt>Makefile.am</tt> recipe:
 +
 +
<pre>if CRYPTOGAMS_AES
 +
 +
  aes_armv4_la_SOURCES = aes-armv4.S
 +
  aes_armv4_la_CCASFLAGS = $(AM_CFLAGS) $(CRYPTOGAMS_AES)
 +
 +
  pkginclude_HEADERS += aes-armv4.h
 +
 +
endif</pre>
 +
 +
== ARMv8 Builds ==
 +
 +
If you need AES for ARMv8 devices, which includes aarch64 and aarch32, then use <tt>aesv8-armx.pl</tt>. You would use something like this in your scripts:
 +
 +
<pre>if ! perl aesv8-armx.pl linux64 armx_aes.S; then
 +
    echo "Failed to translate AES source file"
 +
    exit 1
 +
fi</pre>
 +
 +
<tt>aesv8-armx.pl</tt> provides the following functions:
 +
 +
* <tt>aes_v8_set_encrypt_key</tt>
 +
* <tt>aes_v8_set_decrypt_key</tt>
 +
* <tt>aes_v8_encrypt</tt>
 +
* <tt>aes_v8_decrypt</tt>
 +
 +
If you are building for an Apple M1 you may need to use flavor <tt>ios64</tt>. The translate script does not handle <tt>osx64</tt> properly.
 +
 +
[[Category:Cryptogams]]

Latest revision as of 06:09, 14 March 2021

Cryptogams is Andy Polyakov's project used to develop high speed cryptographic primitives and share them with other developers. This wiki article will show you how to use Cryptogams ARMv4 AES implementation. According to the head notes the ARMv4 implementation runs around 22 to 40 cycles per byte (cpb). Typical C/C++ implementations run around 50 to 80 cpb and Andy's routines should outperform all of them.

Andy's Cryptogam implementations are provided by OpenSSL, but they are also available stand alone under a BSD license. The BSD style license is permissive and allows developers to use Andy's high speed cryptography without an OpenSSL dependency or licensing terms.

There are 6 steps to the process. The first step obtains the sources. The second step creates an ASM source file. The third step compiles and assembles the source file into an object file. The fourth steps determines the API. The fifth step creates a C header file. The final step integrates the object file into a program. Once you create the files aes-armv4.h and aes-armv4.S you can use sed to restore symbols back to their Cryptogams name with sed -i 's|OPENSSL|CRYPTOGAMS|g' aes-armv4.h aes-armv4.S.

A few cautions before you begin. First, you are going to examine undocumented features of the OpenSSL library to learn how to work with the Cryptogam's sources. The Cryptogam sources are stable but things could change over time. Second, the ARMv4 implementation operates in ECB mode and encrypts or decrypts full AES blocks. You are responsible for things like padding and side channel counter-measures.

At the moment Clang is miscompiling aes-armv4.S. Also see LLVM Issue 38133. You can work around the problem by compiling aes-armv4.S with -mthumb, but all data must be aligned. If you don't use aligned buffers then a SIGBUS could occur.

Obtain Source Files[edit]

There are two source files you need for Cryptogams AES. The first is arm-xlate.pl and the second is aes-armv4.pl. They are available in the OpenSSL sources. The following commands fetch OpenSSL and then peels off the two Cryptogams files of interest.

# Clone OpenSSL for the latest Cryptogams sources
git clone https://github.com/openssl/openssl.git

mkdir cryptogams/

cp ./openssl/crypto/perlasm/arm-xlate.pl ./cryptogams/
cp ./openssl/crypto/aes/asm/aes-armv4.pl ./cryptogams/
cp ./openssl/crypto/arm_arch.h cryptogams/

cd cryptogams/

Create ASM File[edit]

The second step is to run aes-armv4.pl to produce an assembly language source file that can be consumed by GCC. aes-armv4.pl internally calls arm-xlate.pl. linux32 is the flavor used by the translate program. aes-armv4.S is the output filename. In the command below note the *.S file extension, which is a capitol S. Do not use a lowercase s because GCC must drive the compile and assemble step.

perl aes-armv4.pl linux32 aes-armv4.S

GCC is needed to drive the process because there are C macros in the source file. Some Cryptogam source files have this requirement, while some others do not. aes-armv4 happens to have the requirement.

$ cat aes-armv4.S
@ Copyright 2007-2018 The OpenSSL Project Authors. All Rights Reserved.
...

#ifndef __KERNEL__
# include "arm_arch.h"
#else
# define __ARM_ARCH__ __LINUX_ARM_ARCH__
#endif
...

At this point there is an ASM file but it needs two small fixups. First, arm_arch.h is an OpenSSL source file so the dependency must be removed. Second, GCC defines __ARM_ARCH instead of __ARM_ARCH__ so a sed is needed.

To fixup the source files execute the following two commands:

# Remove OpenSSL include
sed -i 's/# include "arm_arch.h"//g' aes-armv4.S

# Fix GCC defines
sed -i 's/__ARM_ARCH__/__ARM_ARCH/g' aes-armv4.S

Alternately, instead of the two sed's, you can open arm_arch.h, copy the defines and paste them directly into aes-armv4.S. Take care when using arm_arch.h as it carries the OpenSSL license.

After the two fixups aes-armv4.S is ready to be compiled by GCC.

Compile Source File[edit]

The source file is ready to be compiled and assembled. At this point there are two choices. First, you can use ARMv5t or higher which includes Thumb instructions. The following compiles the source file with ARMv5t.

$ gcc -march=armv5t -c aes-armv4.S

The second choice uses ARMv4 and avoids Thumb instructions. If you want to avoid Thumb then add -marm to you compile command.

$ gcc -march=armv4 -marm -c aes-armv4.S

Using ARMv5t as an example you now have an object file with the following symbols. Symbols with a capitol T are public and exported. Symbols with a lower t are private and should not be used.

$ gcc -march=armv5t -c aes-armv4.S
$ nm aes-armv4.o
000011c0 T AES_decrypt
00000540 T AES_encrypt
00000b60 T AES_set_decrypt_key
00000b80 T AES_set_enc2dec_key
00000820 T AES_set_encrypt_key
00000cc0 t AES_Td
00000000 t AES_Te
000012c0 t _armv4_AES_decrypt
00000640 t _armv4_AES_encrypt
00000b80 t _armv4_AES_set_enc2dec_key
00000820 t _armv4_AES_set_encrypt_key

And you can inspect the generated code with objdump.

$ objdump --disassemble aes-armv4.o
aes-armv4.o:     file format elf32-littlearm
...

00000b60 <AES_set_decrypt_key>:
     b60:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
     b64:       ebffff2d        bl      820 <AES_set_encrypt_key>
     b68:       e3300000        teq     r0, #0
     b6c:       e49de004        pop     {lr}            ; (ldr lr, [sp], #4)
     b70:       1afffff9        bne     b5c <AES_set_encrypt_key+0x33c>
     b74:       e1a00002        mov     r0, r2
     b78:       e1a01002        mov     r1, r2
     b7c:       eaffffff        b       b80 <AES_set_enc2dec_key>

...

And trace it back to the source code in aes-armv4.S.

.globl  AES_set_decrypt_key
.type   AES_set_decrypt_key,%function
.align  5
AES_set_decrypt_key:
        str     lr,[sp,#-4]!            @ push lr
        bl      _armv4_AES_set_encrypt_key
        teq     r0,#0
        ldr     lr,[sp],#4              @ pop lr
        bne     .Labrt

        mov     r0,r2                   @ AES_set_encrypt_key preserves r2,
        mov     r1,r2                   @ which is AES_KEY *key
        b       _armv4_AES_set_enc2dec_key
.size   AES_set_decrypt_key,.-AES_set_decrypt_key

Determine API[edit]

The next step is determine the API so you can call it from a C program. Unfortunately the API is not documented and you have to dig around the OpenSSL sources. The functions of interest are AES_set_encrypt_key, AES_set_decrypt_key, AES_encrypt and AES_decrypt.

A quick grep of OpenSSL sources reveals the following for AES_set_encrypt_key.

openssl$ grep -nIR AES_set_encrypt_key | grep '\.c'
...
crypto/aes/aes_core.c:632:int AES_set_encrypt_key(const unsigned char *userKey, const int bits,

Examining aes_core.c:632 reveals the following.

openssl$ cat -n crypto/aes/aes_core.c
...
   632  int AES_set_encrypt_key(const unsigned char *userKey, const int bits,
   633                          AES_KEY *key)
   634  {
            ...
   728      return 0;
   729  }

The next piece of information to discover is AES_KEY. Again a quick grep leads you to aes_key_st.

openssl$ grep -nIR AES_KEY | grep typedef
include/openssl/aes.h:39:typedef struct aes_key_st AES_KEY;

openssl$ cat -n include/openssl/aes.h
...

    31  struct aes_key_st {
    32  # ifdef AES_LONG
    33      unsigned long rd_key[4 * (AES_MAXNR + 1)];
    34  # else
    35      unsigned int rd_key[4 * (AES_MAXNR + 1)];
    36  # endif
    37      int rounds;
    38  };
    39  typedef struct aes_key_st AES_KEY;

Finally, you need AES_MAXNR from aes.h.

openssl$ grep -IR AES_MAXNR | grep define
include/openssl/aes.h:# define AES_MAXNR 14

Lather, rinse repeat for AES_set_decrypt_key, AES_encrypt and AES_decrypt. AES_encrypt can be found at crypto/aes/aes_core.c:787.

openssl$ grep -nIR AES_encrypt | grep '\.c'
...
crypto/aes/aes_core.c:787:void AES_encrypt(...)


openssl$ cat -n crypto/aes/aes_core.c
...
   783  /*
   784   * Encrypt a single block
   785   * in and out can overlap
   786   */
   787  void AES_encrypt(const unsigned char *in, unsigned char *out,
   788                   const AES_KEY *key) {

And AES_decrypt can be found at aes_core.c:978.

openssl$ grep -nIR AES_decrypt | grep '\.c'
...
crypto/aes/aes_core.c:978:void AES_decrypt(...)


openssl$ cat -n crypto/aes/aes_core.c

   974  /*
   975   * Decrypt a single block
   976   * in and out can overlap
   977   */
   978  void AES_decrypt(const unsigned char *in, unsigned char *out,
   979                   const AES_KEY *key)
   980  {

Create C Header[edit]

The fifth step creates a C header file based on information from Determine API. The header file is needed for two reasons. First, it removes the OpenSSL dependency from your project. Second, it avoids OpenSSL licensing violations.

Below is the C Header file you can use.

/* Header file for use with Cryptogam's ARMv4 AES.     */
/* Also see http://www.openssl.org/~appro/cryptogams/  */
/* https://wiki.openssl.org/index.php/Cryptogams_AES.  */

#ifndef CRYPTOGAMS_AES_ARMV4_H
#define CRYPTOGAMS_AES_ARMV4_H

#ifdef __cplusplus
extern "C" {
#endif

#define AES_MAXNR 14

typedef struct AES_KEY_st {
    unsigned int rd_key[4 * (AES_MAXNR + 1)];
    int rounds;
} AES_KEY;

int AES_set_encrypt_key(const unsigned char *userKey, const int bits, AES_KEY *key);
int AES_set_decrypt_key(const unsigned char *userKey, const int bits, AES_KEY *key);
void AES_encrypt(const unsigned char *in, unsigned char *out, const AES_KEY *key);
void AES_decrypt(const unsigned char *in, unsigned char *out, const AES_KEY *key);

#ifdef __cplusplus
}
#endif

#endif  /* CRYPTOGAMS_AES_ARMV4_H */

Test Program[edit]

The final step is to test the integration of Cryptogam's AES with your program.

$ gcc -std=c99 aes-armv4-test.c ./aes-armv4.o -o aes-armv4-test.exe
$ ./aes-armv4-test.exe
Encrypted plaintext!
Decrypted ciphertext!

And the test program is shown below.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include "aes-armv4.h"

int main(int argc, char* argv[])
{
    /* Test key from FIPS 197 */
    const uint8_t kb[] = {
        0x2b, 0x7e, 0x15, 0x16, 0x28, 0xae, 0xd2, 0xa6,
        0xab, 0xf7, 0x15, 0x88, 0x09, 0xcf, 0x4f, 0x3c
    };

    const uint8_t pb[] = {
        0x6b, 0xc1, 0xbe, 0xe2, 0x2e, 0x40, 0x9f, 0x96,
        0xe9, 0x3d, 0x7e, 0x11, 0x73, 0x93, 0x17, 0x2a
    };

    const uint8_t cb[] = {
        0x3a, 0xd7, 0x7b, 0xb4, 0x0d, 0x7a, 0x36, 0x60, 
        0xa8, 0x9e, 0xca, 0xf3, 0x24, 0x66, 0xef, 0x97
    };

    /* Scratch */
    uint8_t buf[16];
    int result;
    
    /********************************************/

    AES_KEY ekey;
    result = AES_set_encrypt_key(kb, sizeof(kb)*8, & ekey);
    assert(result == 0);

    AES_encrypt(pb, buf, &ekey);
    if (memcmp(cb, buf, 16) == 0)
        printf("Encrypted plaintext!\n");
    else
        printf("Failed to encrypt plaintext!\n");

    /********************************************/

    AES_KEY dkey;
    result = AES_set_decrypt_key(kb, sizeof(kb)*8, & dkey);
    assert(result == 0);

    AES_decrypt(cb, buf, &dkey);
    if (memcmp(pb, buf, 16) == 0)
        printf("Decrypted ciphertext!\n");
    else
        printf("Failed to decrypt ciphertext!\n");
    
    return 0;
}

Symbol Names[edit]

The article used the same names as they appeared in the Cryptogams source code. For example, AES_set_encrypt_key and AES_set_decrypt_key are the names of two functions in the source code, and they will show up in the object file and when compiled and the library when linked.

It is possible the function and date names will collide if you also link to OpenSSL, either directly or indirectly. If you plan on using Cryptogams code in a shared object then you should rename all symbols to avoid collisions. To rename symbols for AES you should rename all AES_* names. Assuming you are using MYLIB as a prefix the following sed should do the job.

sed -i 's/OPENSSL/MYLIB/g' aes_armv4.h aes_armv4.S
sed -i 's/AES_decrypt/MYLIB_AES_decrypt/g' aes_armv4.h aes_armv4.S
sed -i 's/AES_encrypt/MYLIB_AES_encrypt/g' aes_armv4.h aes_armv4.S
sed -i 's/AES_set_decrypt_key/MYLIB_AES_set_decrypt_key/g' aes_armv4.h aes_armv4.S
sed -i 's/AES_set_enc2dec_key/MYLIB_AES_set_enc2dec_key/g' aes_armv4.h aes_armv4.S
sed -i 's/AES_set_encrypt_key/MYLIB_AES_set_encrypt_key/g' aes_armv4.h aes_armv4.S

You can verify public symbols were renamed with nm aes-armv4.o. Generally speaking, all symbols with capitol letters like T (public function), B (uninitialized data), C (common data), D (initialized data), and R (read-only data) should be renamed.

Benchmarks[edit]

You can perform a rough benchmark using the code shown below. Prior to executing the benchmark program you should move the CPU from on-demand or powersave to performance mode.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <string.h>
#include "aes-armv4.h"

typedef unsigned char byte;

int main(int argc, char* argv[])
{
    const unsigned int STEPS = 128;
    byte* buf = (byte*)malloc(STEPS*16+16);
    memset(buf, 0x00, 16);

    AES_KEY ekey;
    (void)AES_set_encrypt_key(buf, 16*8, &ekey);

    double elapsed = 0.0;
    size_t total = 0;

    struct timespec start, end;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);

    do
    {
        size_t idx = 0;
        for (unsigned int i=0; i<STEPS; ++i)
            AES_encrypt(&buf[idx+i], &buf[idx+i+1], &ekey);
        total += 16*STEPS;
        
        clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end);
        elapsed = (end.tv_sec-start.tv_sec);
    }
    while (elapsed < 3 /* seconds */);

    /* Increase precision of elapsed time */
    elapsed = ((double)end.tv_sec-start.tv_sec) +
              ((double)end.tv_nsec-start.tv_nsec) / 1000 / 1000 / 1000;

    /* CPU freq of 950 MHz */
    const double cpuFreq = 950.0*1000*1000;

    const double bytes = total;
    const double ghz = cpuFreq / 1000 / 1000 / 1000;
    const double mbs = bytes / elapsed / 1024 / 1024;
    const double cpb = elapsed * cpuFreq / bytes;
    
    printf("%.0f bytes\n", bytes);
    printf("%.02f mbs\n", mbs);
    printf("%.02f cpb\n", cpb);
    
    free(buf);
    
    return 0;
}

The results below are from a BananaPi with a Cortex-A7 Sun7i SoC running at 950 MHz. A C/C++ AES implementation runs about 65 cpb on the dev-board. Notice aes-armv4.S was compiled with -march=armv7.

$ gcc -march=armv7 -c aes-armv4.S -o aes-armv7.o
$ gcc -O3 -std=c99 aes-armv7-test.c aes-armv7.o -o aes-armv7-test.exe
$ ./aes-armv7-test.exe
78426112 bytes
24.93 mbs
36.34 cpb

And the following is from a Wandboard Dual with a NXP i.MX6 Cortex-A9 running at 1 GHz. A C/C++ implementation runs around 40 cpb.

$ ./aes-armv7-test.exe
106029056 bytes
33.80 mbs
26.80 cpb

Autotools[edit]

If you are using Autotools you can add the following to configure.ac and Makefile.am to conditionally compile aes-armv4.S for A-32 platforms. You will need to detect ARM A-32 and set IS_ARM32 to non-0. Also see Automake Assembly Support in section 8.13 of the manual.

First, the configure.ac recipe:

# Set ASM tools
AC_SUBST([CCAS], [$CC])
AC_SUBST([CCASFLAGS], [$CFLAGS])

# Used by Makefile.am to compile aes-armv4.S
if test "$IS_ARM32" != "0"; then

   ## Save CFLAGS
   SAVED_CFLAGS="$CFLAGS"

   CFLAGS="-march=armv7-a -Wa,--noexecstack"
   AC_MSG_CHECKING([if $CC supports $CFLAGS])
   AC_COMPILE_IFELSE(
      [AC_LANG_PROGRAM([])],
      [AC_MSG_RESULT([yes]); AC_SUBST([tr_RESULT], [1])],
      [AC_MSG_RESULT([no]);  AC_SUBST([tr_RESULT], [0])]
   )

   if test "$tr_RESULT" = "1"; then

      AM_CONDITIONAL([CRYPTOGAMS_AES], [true])
      AC_SUBST([CRYPTOPGAMS_FLAGS], [$CFLAGS])

   else

      CFLAGS="-march=armv7-a"
      AC_MSG_CHECKING([if $CC supports $CFLAGS])
      AC_COMPILE_IFELSE(
         [AC_LANG_PROGRAM([])],
         [AC_MSG_RESULT([yes]); AC_SUBST([tr_RESULT], [1])],
         [AC_MSG_RESULT([no]);  AC_SUBST([tr_RESULT], [0])]
      )

      if test "$tr_RESULT" = "1"; then
         AM_CONDITIONAL([CRYPTOGAMS_AES], [true])
         AC_SUBST([CRYPTOPGAMS_FLAGS], [$CFLAGS])
      else
         AM_CONDITIONAL([CRYPTOGAMS_AES], [false])
      fi
   fi

   ## Restore CFLAGS
   CFLAGS="$SAVED_CFLAGS"
else
   # Required for other platforms
   AM_CONDITIONAL([CRYPTOGAMS_AES], [false])
fi

Second, the Makefile.am recipe:

if CRYPTOGAMS_AES

  aes_armv4_la_SOURCES = aes-armv4.S
  aes_armv4_la_CCASFLAGS = $(AM_CFLAGS) $(CRYPTOGAMS_AES)

  pkginclude_HEADERS += aes-armv4.h

endif

ARMv8 Builds[edit]

If you need AES for ARMv8 devices, which includes aarch64 and aarch32, then use aesv8-armx.pl. You would use something like this in your scripts:

if ! perl aesv8-armx.pl linux64 armx_aes.S; then
    echo "Failed to translate AES source file"
    exit 1
fi

aesv8-armx.pl provides the following functions:

  • aes_v8_set_encrypt_key
  • aes_v8_set_decrypt_key
  • aes_v8_encrypt
  • aes_v8_decrypt

If you are building for an Apple M1 you may need to use flavor ios64. The translate script does not handle osx64 properly.