Usage of renderscript to implement bouncycastle's BlockCipher on android is slow vs. pure-java

0

I've been using bouncycastle and its AESEngine to perform crypto operations in keepshare, but I've noticed that it's somewhat slow. renderscript is an attractive solution to the problem as it's able to implement various algorithms in a semi-native fashion. So, I took a simple aes implementation in C from bradconte.com and dumped it into renderscript.

After a load of trials getting it to even comply with my test cases (mostly the undocumented fact that renderscript is little endian which took me a day to figure out). I've come to the conclusion that it's considerably slower than doing it in pure-java.

Processing a 50kB file takes roughly 1.7 seconds using the bouncycastle pure-java AESEngine implementation, while, using my RenderscriptAESEngine takes approximately 2.4 seconds. This involves about 4000 calls to my renderscript kernel, each block is an individual kernel call.

Unfortunately, the design is streaming/block oriented, so I cannot batch multiple blocks into a single allocation.

Is there anything I can do to improve the performance?

For reference, my java-side and renderscript implementations are listed below:

RenderscriptAESEngine.java:

package com.hanhuy.android.keepshare

import java.io.{InputStream, OutputStream}
import java.nio.{ByteOrder, ByteBuffer}

import android.support.v8.renderscript.{Element, Allocation, RenderScript}
import com.hanhuy.keepassj.{StandardAesEngine, ICipherEngine}
import org.bouncycastle.crypto.io.{CipherInputStream, CipherOutputStream}
import org.bouncycastle.crypto.modes.CBCBlockCipher
import org.bouncycastle.crypto.paddings.PaddedBufferedBlockCipher
import org.bouncycastle.crypto.params.{ParametersWithIV, KeyParameter}
import org.bouncycastle.crypto.{BufferedBlockCipher, CipherParameters, BlockCipher}

/**
 * @author pfnguyen
 */
class RenderscriptAESEngine extends BlockCipher {

  val in = Array.ofDim[Int](4)
  var rs: RenderScript = _
  var scriptc: ScriptC_aes = _
  var ain: Allocation = _
  var aout: Allocation = _
  val outbuf = Array.ofDim[Int](4)

  override def getAlgorithmName = "AES"
  override def getBlockSize = 16
  override def reset() = {
    if (rs != null)
      rs.destroy()
    rs = null
    ain = null
    aout = null
  }

  override def init(b: Boolean, cipherParameters: CipherParameters) = {
    rs = RenderScript.create(Application.instance)
    scriptc = new ScriptC_aes(rs)
    ain = Allocation.createSized(rs, Element.U32_4(rs), 1)
    aout = Allocation.createSized(rs, Element.U32_4(rs), 1)

    cipherParameters match {
      case kp: KeyParameter =>
        val key = kp.getKey
        scriptc.set_key(key)
        scriptc.set_keysize(key.length * 8)
        scriptc.set_forEncryption(if (b) 1 else 0)
        scriptc.invoke_aes_init_key()
    }
  }

  override def processBlock(bytes: Array[Byte], i: Int, bytes1: Array[Byte], i1: Int) = {
    val buf = ByteBuffer.wrap(bytes)
    buf.position(i)
    buf.order(ByteOrder.LITTLE_ENDIAN)
    in(0) = buf.getInt()
    in(1) = buf.getInt()
    in(2) = buf.getInt()
    in(3) = buf.getInt()

    ain.copyFrom(in)
    scriptc.forEach_process_block(ain, aout)
    aout.copyTo(outbuf)

    val out = ByteBuffer.wrap(bytes1)
    out.position(i1)
    out.order(ByteOrder.LITTLE_ENDIAN)
    out.putInt(outbuf(0))
    out.putInt(outbuf(1))
    out.putInt(outbuf(2))
    out.putInt(outbuf(3))

    16
  }
}

class AesEngine extends ICipherEngine {
  override def getDisplayName = "AES"
  override def getCipherUuid = StandardAesEngine.getAesUuid

  def makeCipher(mode: Boolean, key: Array[Byte], iv: Array[Byte]): BufferedBlockCipher = {
    val aes = new RenderscriptAESEngine
    val k = new KeyParameter(key.clone())
    val i = new ParametersWithIV(k, iv.clone())
    val cipher = new PaddedBufferedBlockCipher(new CBCBlockCipher(aes))
    cipher.init(mode, i)
    cipher
  }

  override def EncryptStream(sPlainText: OutputStream, pbKey: Array[Byte], pbIV: Array[Byte]) =
    new CipherOutputStream(sPlainText, makeCipher(true, pbKey, pbIV))
  override def DecryptStream(sEncrypted: InputStream, pbKey: Array[Byte], pbIV: Array[Byte]) =
    new CipherInputStream(sEncrypted, makeCipher(false, pbKey, pbIV))
}

aes.rs (some parts elided to fit within post limit):

#pragma version(1)
#pragma rs java_package_name(com.hanhuy.android.keepshare)

/*********************************************************************
* Filename:   aes.c
* Author:     Brad Conte (brad AT bradconte.com)
* Copyright:
* Disclaimer: This code is presented "as is" without any guarantees.
* Details:    This code is the implementation of the AES algorithm and
              the CTR, CBC, and CCM modes of operation it can be used in.
               AES is, specified by the NIST in in publication FIPS PUB 197,
              availible at:
               * http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf .
              The CBC and CTR modes of operation are specified by
              NIST SP 800-38 A, available at:
               * http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf .
              The CCM mode of operation is specified by NIST SP80-38 C, available at:
               * http://csrc.nist.gov/publications/nistpubs/800-38C/SP800-38C_updated-July20_2007.pdf
*********************************************************************/

/****************************** MACROS ******************************/
// The least significant byte of the word is rotated to the end.
#define KE_ROTWORD(x) (((x) << 8) | ((x) >> 24))

#define BYTE uint8_t
#define WORD uint32_t

#define TRUE  1
#define FALSE 0

/**************************** DATA TYPES ****************************/
#define AES_128_ROUNDS 10
#define AES_192_ROUNDS 12
#define AES_256_ROUNDS 14

/**************************** VARIABLES *****************************/
// This is the specified AES SBox. To look up a substitution value, put the first
// nibble in the first index (row) and the second nibble in the second index (column).
static const BYTE aes_sbox[16][16] = SBox;

static const BYTE aes_invsbox[16][16] = SiBox;

// This table stores pre-calculated values for all possible GF(2^8) calculations.This
// table is only used by the (Inv)MixColumns steps.
// USAGE: The second index (column) is the coefficient of multiplication. Only 7 different
// coefficients are used: 0x01, 0x02, 0x03, 0x09, 0x0b, 0x0d, 0x0e, but multiplication by
// 1 is negligible leaving only 6 coefficients. Each column of the table is devoted to one
// of these coefficients, in the ascending order of value, from values 0x00 to 0xFF.
static const BYTE gf_mul[256][6] = gf_mul_table;

/*********************** FUNCTION DEFINITIONS ***********************/
// XORs the in and out buffers, storing the result in out. Length is in bytes.
/*
void xor_buf(const BYTE in[], BYTE out[], size_t len)
{
    size_t idx;

    for (idx = 0; idx < len; idx++)
        out[idx] ^= in[idx];
}
*/

/*******************
* AES
*******************/
/////////////////
// KEY EXPANSION
/////////////////

// Substitutes a word using the AES S-Box.
static WORD SubWord(WORD word)
{
    unsigned int result;

    result = (int)aes_sbox[(word >> 4) & 0x0000000F][word & 0x0000000F];
    result += (int)aes_sbox[(word >> 12) & 0x0000000F][(word >> 8) & 0x0000000F] << 8;
    result += (int)aes_sbox[(word >> 20) & 0x0000000F][(word >> 16) & 0x0000000F] << 16;
    result += (int)aes_sbox[(word >> 28) & 0x0000000F][(word >> 24) & 0x0000000F] << 24;
    return(result);
}

// Performs the action of generating the keys that will be used in every round of
// encryption. "key" is the user-supplied input key, "w" is the output key schedule,
// "keysize" is the length in bits of "key", must be 128, 192, or 256.
static void aes_key_setup(const BYTE key[], WORD w[], int keysize)
{
    int Nb=4,Nr,Nk,idx;
    WORD temp,Rcon[]={0x01000000,0x02000000,0x04000000,0x08000000,0x10000000,0x20000000,
                      0x40000000,0x80000000,0x1b000000,0x36000000,0x6c000000,0xd8000000,
                      0xab000000,0x4d000000,0x9a000000};

    switch (keysize) {
        case 128: Nr = 10; Nk = 4; break;
        case 192: Nr = 12; Nk = 6; break;
        case 256: Nr = 14; Nk = 8; break;
        default: return;
    }

    for (idx=0; idx < Nk; ++idx) {
        w[idx] = ((key[4 * idx]) << 24) | ((key[4 * idx + 1]) << 16) |
                   ((key[4 * idx + 2]) << 8) | ((key[4 * idx + 3]));
    }

    for (idx = Nk; idx < Nb * (Nr+1); ++idx) {
        temp = w[idx - 1];
        if ((idx % Nk) == 0)
            temp = SubWord(KE_ROTWORD(temp)) ^ Rcon[(idx-1)/Nk];
        else if (Nk > 6 && (idx % Nk) == 4)
            temp = SubWord(temp);
        w[idx] = w[idx-Nk] ^ temp;
    }
}

/////////////////
// ADD ROUND KEY
/////////////////

// Performs the AddRoundKey step. Each round has its own pre-generated 16-byte key in the
// form of 4 integers (the "w" array). Each integer is XOR'd by one column of the state.
// Also performs the job of InvAddRoundKey(); since the function is a simple XOR process,
// it is its own inverse.
static void AddRoundKey(BYTE state[][4], const WORD w[])
{
    BYTE subkey[4];

    // memcpy(subkey,&w[idx],4); // Not accurate for big endian machines
    // Subkey 1
    subkey[0] = w[0] >> 24;
    subkey[1] = w[0] >> 16;
    subkey[2] = w[0] >> 8;
    subkey[3] = w[0];
    state[0][0] ^= subkey[0];
    state[1][0] ^= subkey[1];
    state[2][0] ^= subkey[2];
    state[3][0] ^= subkey[3];
    // Subkey 2
    subkey[0] = w[1] >> 24;
    subkey[1] = w[1] >> 16;
    subkey[2] = w[1] >> 8;
    subkey[3] = w[1];
    state[0][1] ^= subkey[0];
    state[1][1] ^= subkey[1];
    state[2][1] ^= subkey[2];
    state[3][1] ^= subkey[3];
    // Subkey 3
    subkey[0] = w[2] >> 24;
    subkey[1] = w[2] >> 16;
    subkey[2] = w[2] >> 8;
    subkey[3] = w[2];
    state[0][2] ^= subkey[0];
    state[1][2] ^= subkey[1];
    state[2][2] ^= subkey[2];
    state[3][2] ^= subkey[3];
    // Subkey 4
    subkey[0] = w[3] >> 24;
    subkey[1] = w[3] >> 16;
    subkey[2] = w[3] >> 8;
    subkey[3] = w[3];
    state[0][3] ^= subkey[0];
    state[1][3] ^= subkey[1];
    state[2][3] ^= subkey[2];
    state[3][3] ^= subkey[3];
}

/////////////////
// (Inv)SubBytes
/////////////////

// Performs the SubBytes step. All bytes in the state are substituted with a
// pre-calculated value from a lookup table.
static void SubBytes(BYTE state[][4])
{
    state[0][0] = aes_sbox[state[0][0] >> 4][state[0][0] & 0x0F];
    state[0][1] = aes_sbox[state[0][1] >> 4][state[0][1] & 0x0F];
    state[0][2] = aes_sbox[state[0][2] >> 4][state[0][2] & 0x0F];
    state[0][3] = aes_sbox[state[0][3] >> 4][state[0][3] & 0x0F];
    state[1][0] = aes_sbox[state[1][0] >> 4][state[1][0] & 0x0F];
    state[1][1] = aes_sbox[state[1][1] >> 4][state[1][1] & 0x0F];
    state[1][2] = aes_sbox[state[1][2] >> 4][state[1][2] & 0x0F];
    state[1][3] = aes_sbox[state[1][3] >> 4][state[1][3] & 0x0F];
    state[2][0] = aes_sbox[state[2][0] >> 4][state[2][0] & 0x0F];
    state[2][1] = aes_sbox[state[2][1] >> 4][state[2][1] & 0x0F];
    state[2][2] = aes_sbox[state[2][2] >> 4][state[2][2] & 0x0F];
    state[2][3] = aes_sbox[state[2][3] >> 4][state[2][3] & 0x0F];
    state[3][0] = aes_sbox[state[3][0] >> 4][state[3][0] & 0x0F];
    state[3][1] = aes_sbox[state[3][1] >> 4][state[3][1] & 0x0F];
    state[3][2] = aes_sbox[state[3][2] >> 4][state[3][2] & 0x0F];
    state[3][3] = aes_sbox[state[3][3] >> 4][state[3][3] & 0x0F];
}

static void InvSubBytes(BYTE state[][4])
{
    state[0][0] = aes_invsbox[state[0][0] >> 4][state[0][0] & 0x0F];
    state[0][1] = aes_invsbox[state[0][1] >> 4][state[0][1] & 0x0F];
    state[0][2] = aes_invsbox[state[0][2] >> 4][state[0][2] & 0x0F];
    state[0][3] = aes_invsbox[state[0][3] >> 4][state[0][3] & 0x0F];
    state[1][0] = aes_invsbox[state[1][0] >> 4][state[1][0] & 0x0F];
    state[1][1] = aes_invsbox[state[1][1] >> 4][state[1][1] & 0x0F];
    state[1][2] = aes_invsbox[state[1][2] >> 4][state[1][2] & 0x0F];
    state[1][3] = aes_invsbox[state[1][3] >> 4][state[1][3] & 0x0F];
    state[2][0] = aes_invsbox[state[2][0] >> 4][state[2][0] & 0x0F];
    state[2][1] = aes_invsbox[state[2][1] >> 4][state[2][1] & 0x0F];
    state[2][2] = aes_invsbox[state[2][2] >> 4][state[2][2] & 0x0F];
    state[2][3] = aes_invsbox[state[2][3] >> 4][state[2][3] & 0x0F];
    state[3][0] = aes_invsbox[state[3][0] >> 4][state[3][0] & 0x0F];
    state[3][1] = aes_invsbox[state[3][1] >> 4][state[3][1] & 0x0F];
    state[3][2] = aes_invsbox[state[3][2] >> 4][state[3][2] & 0x0F];
    state[3][3] = aes_invsbox[state[3][3] >> 4][state[3][3] & 0x0F];
}

/////////////////
// (Inv)ShiftRows
/////////////////

// Performs the ShiftRows step. All rows are shifted cylindrically to the left.
static void ShiftRows(BYTE state[][4])
{
    int t;

    // Shift left by 1
    t = state[1][0];
    state[1][0] = state[1][1];
    state[1][1] = state[1][2];
    state[1][2] = state[1][3];
    state[1][3] = t;
    // Shift left by 2
    t = state[2][0];
    state[2][0] = state[2][2];
    state[2][2] = t;
    t = state[2][1];
    state[2][1] = state[2][3];
    state[2][3] = t;
    // Shift left by 3
    t = state[3][0];
    state[3][0] = state[3][3];
    state[3][3] = state[3][2];
    state[3][2] = state[3][1];
    state[3][1] = t;
}

// All rows are shifted cylindrically to the right.
static void InvShiftRows(BYTE state[][4])
{
    int t;

    // Shift right by 1
    t = state[1][3];
    state[1][3] = state[1][2];
    state[1][2] = state[1][1];
    state[1][1] = state[1][0];
    state[1][0] = t;
    // Shift right by 2
    t = state[2][3];
    state[2][3] = state[2][1];
    state[2][1] = t;
    t = state[2][2];
    state[2][2] = state[2][0];
    state[2][0] = t;
    // Shift right by 3
    t = state[3][3];
    state[3][3] = state[3][0];
    state[3][0] = state[3][1];
    state[3][1] = state[3][2];
    state[3][2] = t;
}

/////////////////
// (Inv)MixColumns
/////////////////

// Performs the MixColums step. The state is multiplied by itself using matrix
// multiplication in a Galios Field 2^8. All multiplication is pre-computed in a table.
// Addition is equivilent to XOR. (Must always make a copy of the column as the original
// values will be destoyed.)
static void MixColumns(BYTE state[][4])
{
    BYTE col[4];

    // Column 1
    col[0] = state[0][0];
    col[1] = state[1][0];
    col[2] = state[2][0];
    col[3] = state[3][0];
    state[0][0] = gf_mul[col[0]][0];
    state[0][0] ^= gf_mul[col[1]][1];
    state[0][0] ^= col[2];
    state[0][0] ^= col[3];
    state[1][0] = col[0];
    state[1][0] ^= gf_mul[col[1]][0];
    state[1][0] ^= gf_mul[col[2]][1];
    state[1][0] ^= col[3];
    state[2][0] = col[0];
    state[2][0] ^= col[1];
    state[2][0] ^= gf_mul[col[2]][0];
    state[2][0] ^= gf_mul[col[3]][1];
    state[3][0] = gf_mul[col[0]][1];
    state[3][0] ^= col[1];
    state[3][0] ^= col[2];
    state[3][0] ^= gf_mul[col[3]][0];
    // Column 2
    col[0] = state[0][1];
    col[1] = state[1][1];
    col[2] = state[2][1];
    col[3] = state[3][1];
    state[0][1] = gf_mul[col[0]][0];
    state[0][1] ^= gf_mul[col[1]][1];
    state[0][1] ^= col[2];
    state[0][1] ^= col[3];
    state[1][1] = col[0];
    state[1][1] ^= gf_mul[col[1]][0];
    state[1][1] ^= gf_mul[col[2]][1];
    state[1][1] ^= col[3];
    state[2][1] = col[0];
    state[2][1] ^= col[1];
    state[2][1] ^= gf_mul[col[2]][0];
    state[2][1] ^= gf_mul[col[3]][1];
    state[3][1] = gf_mul[col[0]][1];
    state[3][1] ^= col[1];
    state[3][1] ^= col[2];
    state[3][1] ^= gf_mul[col[3]][0];
    // Column 3
    col[0] = state[0][2];
    col[1] = state[1][2];
    col[2] = state[2][2];
    col[3] = state[3][2];
    state[0][2] = gf_mul[col[0]][0];
    state[0][2] ^= gf_mul[col[1]][1];
    state[0][2] ^= col[2];
    state[0][2] ^= col[3];
    state[1][2] = col[0];
    state[1][2] ^= gf_mul[col[1]][0];
    state[1][2] ^= gf_mul[col[2]][1];
    state[1][2] ^= col[3];
    state[2][2] = col[0];
    state[2][2] ^= col[1];
    state[2][2] ^= gf_mul[col[2]][0];
    state[2][2] ^= gf_mul[col[3]][1];
    state[3][2] = gf_mul[col[0]][1];
    state[3][2] ^= col[1];
    state[3][2] ^= col[2];
    state[3][2] ^= gf_mul[col[3]][0];
    // Column 4
    col[0] = state[0][3];
    col[1] = state[1][3];
    col[2] = state[2][3];
    col[3] = state[3][3];
    state[0][3] = gf_mul[col[0]][0];
    state[0][3] ^= gf_mul[col[1]][1];
    state[0][3] ^= col[2];
    state[0][3] ^= col[3];
    state[1][3] = col[0];
    state[1][3] ^= gf_mul[col[1]][0];
    state[1][3] ^= gf_mul[col[2]][1];
    state[1][3] ^= col[3];
    state[2][3] = col[0];
    state[2][3] ^= col[1];
    state[2][3] ^= gf_mul[col[2]][0];
    state[2][3] ^= gf_mul[col[3]][1];
    state[3][3] = gf_mul[col[0]][1];
    state[3][3] ^= col[1];
    state[3][3] ^= col[2];
    state[3][3] ^= gf_mul[col[3]][0];
}

static void InvMixColumns(BYTE state[][4])
{
    BYTE col[4];

    // Column 1
    col[0] = state[0][0];
    col[1] = state[1][0];
    col[2] = state[2][0];
    col[3] = state[3][0];
    state[0][0] = gf_mul[col[0]][5];
    state[0][0] ^= gf_mul[col[1]][3];
    state[0][0] ^= gf_mul[col[2]][4];
    state[0][0] ^= gf_mul[col[3]][2];
    state[1][0] = gf_mul[col[0]][2];
    state[1][0] ^= gf_mul[col[1]][5];
    state[1][0] ^= gf_mul[col[2]][3];
    state[1][0] ^= gf_mul[col[3]][4];
    state[2][0] = gf_mul[col[0]][4];
    state[2][0] ^= gf_mul[col[1]][2];
    state[2][0] ^= gf_mul[col[2]][5];
    state[2][0] ^= gf_mul[col[3]][3];
    state[3][0] = gf_mul[col[0]][3];
    state[3][0] ^= gf_mul[col[1]][4];
    state[3][0] ^= gf_mul[col[2]][2];
    state[3][0] ^= gf_mul[col[3]][5];
    // Column 2
    col[0] = state[0][1];
    col[1] = state[1][1];
    col[2] = state[2][1];
    col[3] = state[3][1];
    state[0][1] = gf_mul[col[0]][5];
    state[0][1] ^= gf_mul[col[1]][3];
    state[0][1] ^= gf_mul[col[2]][4];
    state[0][1] ^= gf_mul[col[3]][2];
    state[1][1] = gf_mul[col[0]][2];
    state[1][1] ^= gf_mul[col[1]][5];
    state[1][1] ^= gf_mul[col[2]][3];
    state[1][1] ^= gf_mul[col[3]][4];
    state[2][1] = gf_mul[col[0]][4];
    state[2][1] ^= gf_mul[col[1]][2];
    state[2][1] ^= gf_mul[col[2]][5];
    state[2][1] ^= gf_mul[col[3]][3];
    state[3][1] = gf_mul[col[0]][3];
    state[3][1] ^= gf_mul[col[1]][4];
    state[3][1] ^= gf_mul[col[2]][2];
    state[3][1] ^= gf_mul[col[3]][5];
    // Column 3
    col[0] = state[0][2];
    col[1] = state[1][2];
    col[2] = state[2][2];
    col[3] = state[3][2];
    state[0][2] = gf_mul[col[0]][5];
    state[0][2] ^= gf_mul[col[1]][3];
    state[0][2] ^= gf_mul[col[2]][4];
    state[0][2] ^= gf_mul[col[3]][2];
    state[1][2] = gf_mul[col[0]][2];
    state[1][2] ^= gf_mul[col[1]][5];
    state[1][2] ^= gf_mul[col[2]][3];
    state[1][2] ^= gf_mul[col[3]][4];
    state[2][2] = gf_mul[col[0]][4];
    state[2][2] ^= gf_mul[col[1]][2];
    state[2][2] ^= gf_mul[col[2]][5];
    state[2][2] ^= gf_mul[col[3]][3];
    state[3][2] = gf_mul[col[0]][3];
    state[3][2] ^= gf_mul[col[1]][4];
    state[3][2] ^= gf_mul[col[2]][2];
    state[3][2] ^= gf_mul[col[3]][5];
    // Column 4
    col[0] = state[0][3];
    col[1] = state[1][3];
    col[2] = state[2][3];
    col[3] = state[3][3];
    state[0][3] = gf_mul[col[0]][5];
    state[0][3] ^= gf_mul[col[1]][3];
    state[0][3] ^= gf_mul[col[2]][4];
    state[0][3] ^= gf_mul[col[3]][2];
    state[1][3] = gf_mul[col[0]][2];
    state[1][3] ^= gf_mul[col[1]][5];
    state[1][3] ^= gf_mul[col[2]][3];
    state[1][3] ^= gf_mul[col[3]][4];
    state[2][3] = gf_mul[col[0]][4];
    state[2][3] ^= gf_mul[col[1]][2];
    state[2][3] ^= gf_mul[col[2]][5];
    state[2][3] ^= gf_mul[col[3]][3];
    state[3][3] = gf_mul[col[0]][3];
    state[3][3] ^= gf_mul[col[1]][4];
    state[3][3] ^= gf_mul[col[2]][2];
    state[3][3] ^= gf_mul[col[3]][5];
}

static void aes_encrypt(const BYTE in[], BYTE out[], const WORD key[], int keysize)
{
    BYTE state[4][4];

    // Copy input array (should be 16 bytes long) to a matrix (sequential bytes are ordered
    // by row, not col) called "state" for processing.
    // *** Implementation note: The official AES documentation references the state by
    // column, then row. Accessing an element in C requires row then column. Thus, all state
    // references in AES must have the column and row indexes reversed for C implementation.
    state[0][0] = in[0];
    state[1][0] = in[1];
    state[2][0] = in[2];
    state[3][0] = in[3];
    state[0][1] = in[4];
    state[1][1] = in[5];
    state[2][1] = in[6];
    state[3][1] = in[7];
    state[0][2] = in[8];
    state[1][2] = in[9];
    state[2][2] = in[10];
    state[3][2] = in[11];
    state[0][3] = in[12];
    state[1][3] = in[13];
    state[2][3] = in[14];
    state[3][3] = in[15];

    // Perform the necessary number of rounds. The round key is added first.
    // The last round does not perform the MixColumns step.
    AddRoundKey(state,&key[0]);
    SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[4]);
    SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[8]);
    SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[12]);
    SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[16]);
    SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[20]);
    SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[24]);
    SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[28]);
    SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[32]);
    SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[36]);
    if (keysize != 128) {
        SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[40]);
        SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[44]);
        if (keysize != 192) {
            SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[48]);
            SubBytes(state); ShiftRows(state); MixColumns(state); AddRoundKey(state,&key[52]);
            SubBytes(state); ShiftRows(state); AddRoundKey(state,&key[56]);
        }
        else {
            SubBytes(state); ShiftRows(state); AddRoundKey(state,&key[48]);
        }
    }
    else {
        SubBytes(state); ShiftRows(state); AddRoundKey(state,&key[40]);
    }

    // Copy the state to the output array.
    out[0] = state[0][0];
    out[1] = state[1][0];
    out[2] = state[2][0];
    out[3] = state[3][0];
    out[4] = state[0][1];
    out[5] = state[1][1];
    out[6] = state[2][1];
    out[7] = state[3][1];
    out[8] = state[0][2];
    out[9] = state[1][2];
    out[10] = state[2][2];
    out[11] = state[3][2];
    out[12] = state[0][3];
    out[13] = state[1][3];
    out[14] = state[2][3];
    out[15] = state[3][3];
}

static void aes_decrypt(const BYTE in[], BYTE out[], const WORD key[], int keysize)
{
    BYTE state[4][4];

    // Copy the input to the state.
    state[0][0] = in[0];
    state[1][0] = in[1];
    state[2][0] = in[2];
    state[3][0] = in[3];
    state[0][1] = in[4];
    state[1][1] = in[5];
    state[2][1] = in[6];
    state[3][1] = in[7];
    state[0][2] = in[8];
    state[1][2] = in[9];
    state[2][2] = in[10];
    state[3][2] = in[11];
    state[0][3] = in[12];
    state[1][3] = in[13];
    state[2][3] = in[14];
    state[3][3] = in[15];

    // Perform the necessary number of rounds. The round key is added first.
    // The last round does not perform the MixColumns step.
    if (keysize > 128) {
        if (keysize > 192) {
            AddRoundKey(state,&key[56]);
            InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[52]);InvMixColumns(state);
            InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[48]);InvMixColumns(state);
        }
        else {
            AddRoundKey(state,&key[48]);
        }
        InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[44]);InvMixColumns(state);
        InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[40]);InvMixColumns(state);
    }
    else {
        AddRoundKey(state,&key[40]);
    }
    InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[36]);InvMixColumns(state);
    InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[32]);InvMixColumns(state);
    InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[28]);InvMixColumns(state);
    InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[24]);InvMixColumns(state);
    InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[20]);InvMixColumns(state);
    InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[16]);InvMixColumns(state);
    InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[12]);InvMixColumns(state);
    InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[8]);InvMixColumns(state);
    InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[4]);InvMixColumns(state);
    InvShiftRows(state);InvSubBytes(state);AddRoundKey(state,&key[0]);

    // Copy the state to the output array.
    out[0] = state[0][0];
    out[1] = state[1][0];
    out[2] = state[2][0];
    out[3] = state[3][0];
    out[4] = state[0][1];
    out[5] = state[1][1];
    out[6] = state[2][1];
    out[7] = state[3][1];
    out[8] = state[0][2];
    out[9] = state[1][2];
    out[10] = state[2][2];
    out[11] = state[3][2];
    out[12] = state[0][3];
    out[13] = state[1][3];
    out[14] = state[2][3];
    out[15] = state[3][3];
}

// renderscript glue

int8_t  key[32];
int32_t keysize;
int8_t forEncryption = FALSE;
static WORD keydata[60];

void aes_init_key() {
  aes_key_setup((BYTE *)key, keydata, keysize);
}

uint4 __attribute__((kernel)) process_block(uint4 in4) {
  uint32_t in[4];
  in[0] = in4.s0;
  in[1] = in4.s1;
  in[2] = in4.s2;
  in[3] = in4.s3;

  uint32_t out[4];
  uint4 data;
  if (forEncryption) {
    aes_encrypt((BYTE *)in, (BYTE *)out, keydata, keysize);
  } else {
    aes_decrypt((BYTE *)in, (BYTE *)out, keydata, keysize);
  }
  data.s0 = out[0];
  data.s1 = out[1];
  data.s2 = out[2];
  data.s3 = out[3];
  return data;
}

The fact that the java-side is implemented in scala has no bearing on performance (it compiles down to the same java primitives, verified using javap -c)

I tried out using the built-in cipher from android directly, no performance gain, code listing for that follows:

class JceAESEngine extends BlockCipher {
  var cipher: javax.crypto.Cipher = _
  override def getAlgorithmName = "AES"
  override def getBlockSize = 16

  override def init(b: Boolean, cipherParameters: CipherParameters) = {
    cipherParameters match {
      case kp: KeyParameter =>
        val key = kp.getKey
        val keyspec = new SecretKeySpec(key, getAlgorithmName)
        cipher = javax.crypto.Cipher.getInstance(getAlgorithmName + "/ECB/NoPadding")
        cipher.init(if (b) javax.crypto.Cipher.ENCRYPT_MODE else
          javax.crypto.Cipher.DECRYPT_MODE,
          keyspec)
    }
  }

  override def processBlock(bytes: Array[Byte], i: Int, bytes1: Array[Byte], i1: Int) =
    cipher.update(bytes, i, 16, bytes1, i1)

  override def reset() = cipher = null
}
android
scala
encryption
renderscript
asked on Stack Overflow May 21, 2015 by pfn • edited May 21, 2015 by pfn

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0