Issues using custom HLS block under Linux, despite a validated bare-metal design

1

I have written an RSA encryption block in HLS (Using Vivado 2017.2), and am trying to exercise it under Linux on the zedboard (Zynq 7020). I have verified that the hardware works, and have a fully-functioning bare-metal software interface to the hardware. Note that the HLS block is an AXI4-lite slave.

Things started to get weird when writing a linux kernel module (device driver) for it (note that I'm deliberately not going through the Xilinx UIO framework because reasons...creating my own kernel module instead). The problem was that each time I started the block and kept the inputs constant, the results changed. This is not how the hardware block behaves, and I validated this in a bare-metal test in SDK. So it must be something weird in my kernel module, in light of this....right?

Well, actually no. Not wanting to waste time debugging on the kernel, I decide I decide to get some ground truth data by bypass the kernel module all together and instead use devmem2 to directly read/write to the hardware. Just to make sure I'm not going insane, and to affirm to myself again that the hardware works. However, to my dismay, writing to the input registers manually using devmem2 returned garbage (incorrect) data. And it returned different garbage data every time I started the block, even when I did not change the inputs.

I again checked that the bare metal hardware works. It does. I even printed out the EXACT values I was writing into the AXI registers, just in case I was messing up the bye-Endianness (since each 1024-bit input has 32 input registers, I figured maybe I was writing the words in backwards?). Once I obtained the EXACT values I was writing to the input ports in my bare-metal drivers, I proceeded to write a script that manually writes to each register using devmem2. This was to be absolutely SURE that I am writing the correct values.

Once I loaded the input values (base, exponent, modulus, and operating mode), I write 0x1 to the ap_ctrl register to start the block.

The answer is STILL different every time.

I have NO IDEA what is going on here. Again, the hardware WORKS when I write data to it using the Xilinx bare-metal drivers. But under Linux, even when directly writing to physical memory using devmem2, everything breaks. And I know its most likely not the hardware, because it returns a DIFFERENT incorrect answer every time.

So I have absolutely no idea how to proceed with debugging. For reference, lets compare my bare metal driver with the linux devmem2 method:

Bare metal Encryption function:

uint8_t privexp_arr[] = {0xA1,0x11,0xAD,0xAD,0x48,0x88,0xF5,0x2D,0x35,0xF5,0x42,0x8E,0x39,0x39,0x68,0x06,0xBE,0x32,0x52,0x5C,0xDA,0x2B,0xF2,0x2A,0x27,0x58,0x1B,0xDE,0xEE,0x18,0x63,0x92,0xD8,0x9F,0x02,0x2C,0xFB,0xDF,0x77,0xE6,0x1F,0xDB,0xDC,0x84,0x6C,0x90,0x38,0xA0,0x8D,0x8A,0xEB,0x5C,0x2A,0xF7,0xCC,0x25,0x9D,0x62,0xBA,0xB5,0xB2,0xB8,0x7B,0xCD,0x66,0xD6,0x77,0xD5,0x32,0x9D,0xF1,0x98,0x9C,0xB1,0xAC,0x50,0x23,0x7C,0xCF,0x28,0x69,0x32,0xD9,0x3A,0x21,0x82,0x9D,0xE0,0xE1,0xBA,0x12,0x3C,0x79,0x95,0x10,0x7A,0x50,0x6E,0xA2,0x91,0x87,0x04,0x2B,0x6F,0xE4,0x8C,0x05,0x51,0x31,0x81,0x50,0xE9,0x52,0x69,0x09,0xCF,0x68,0x1D,0x74,0x88,0x6B,0x17,0x43,0xE8,0xFD,0x9C,0x7B,0x04};
uint32_t publexp_arr [] = {0x10001,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
uint8_t modulus_arr[] = {0x49,0xF5,0xEB,0x73,0x5B,0x82,0x9C,0xEB,0x4B,0xC2,0xAF,0x74,0x64,0x29,0x38,0xA8,0xAF,0x7E,0xA4,0x77,0xBA,0x9C,0x79,0xB6,0x9B,0x5E,0x65,0xBC,0xBA,0x74,0x84,0x3E,0x84,0xBF,0x5C,0xD4,0xD1,0xF4,0xEC,0xD4,0x83,0x3D,0xC6,0x9B,0x7B,0x52,0x5C,0x2F,0x25,0x79,0x6D,0x21,0x79,0xB3,0x31,0x7A,0x0D,0xAD,0xB1,0xB9,0xDC,0x5F,0xE5,0x3D,0x13,0x21,0xF6,0xFB,0x97,0x1A,0xFB,0xB9,0x7F,0x4D,0x26,0x0F,0x10,0x37,0xEA,0xEA,0xEC,0x97,0xA4,0x79,0x37,0xFB,0x62,0x33,0x9E,0xB3,0x28,0xC4,0x30,0x8A,0xA6,0x94,0x9A,0x9F,0x0D,0xDF,0xE2,0xF5,0xB4,0x1F,0x25,0x4F,0xE1,0x6F,0x35,0xBF,0x82,0xBF,0xE6,0xA2,0xA0,0x15,0x80,0xA1,0x69,0x97,0xD8,0x3D,0x85,0x88,0x9E,0x88,0x4D,0xD9};
const uint8_t ciphertext_golden_ans[] = {0xF0,0xCA,0x37,0xC7,0xFA,0x38,0xB3,0xDF,0x00,0xA6,0xFA,0x10,0x14,0xEA,0xD7,0x36,0x83,0x61,0x5F,0x12,0x29,0x6C,0x19,0xC3,0x3A,0xC6,0x03,0xC9,0x74,0xF2,0x9E,0x57,0x68,0x2C,0xA8,0xAD,0xE6,0xAF,0x27,0x35,0xEF,0xD6,0x33,0x34,0xA8,0x0F,0x8E,0x2D,0x84,0xA5,0xA9,0xF3,0xC6,0x9A,0xF7,0xC9,0xB6,0x9B,0x12,0x0E,0xF3,0x40,0x6E,0x8E,0x2A,0x40,0x4B,0x6C,0x63,0x6B,0x42,0xEC,0xE6,0xB5,0x2E,0x1D,0x5A,0x95,0xFF,0x8E,0xAF,0xB3,0x24,0x8D,0x88,0x01,0x61,0x42,0x1D,0xA9,0x80,0x93,0xD2,0xE9,0x04,0x30,0x63,0x43,0x16,0xC1,0xD0,0xCC,0xFD,0xD1,0xA0,0xA8,0xC3,0xD0,0x73,0xF6,0x66,0x38,0x95,0x42,0xA1,0x75,0x77,0xD1,0xE2,0xBB,0xB8,0x49,0x7B,0x78,0x6F,0x66,0x44,0x93};
const uint32_t plaintext_golden_ans[] = {0x726C6421,0x2C20576F,0x656C6C6F,0x00000048,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};

int32_t wsrsa_encrypt(uint8_t* plaintext, uint8_t* publexp, uint8_t*modulus,  uint8_t* ciphertext)
{
    // Set Base
    XWsrsa1024_Base_v plaintext_st;
    memcpy(&plaintext_st, plaintext, sizeof(XWsrsa1024_Base_v));
    XWsrsa1024_Set_base_V(&xrsamodexp,plaintext_st);

    // Set public exponent
    XWsrsa1024_Publexp_v publexp_st;
    memcpy(&publexp_st, publexp, sizeof(XWsrsa1024_Publexp_v));
    XWsrsa1024_Set_publexp_V(&xrsamodexp, publexp_st);

    // Set Modulus
    XWsrsa1024_Modulus_v modulus_st;
    memcpy(&modulus_st, modulus, sizeof(modulus_st));
    XWsrsa1024_Set_modulus_V(&xrsamodexp, modulus_st);

    // Create empty result struct and initialize it to zero
    XWsrsa1024_Result_v ciphertext_st;
        memset(ciphertext_st, 0, sizeof(XWsrsa1024_Result_v));

    // Print input data for debugging
    xil_printf("BASE DATA = \n");   printBaseData(&xrsamodexp,plaintext_st);
    xil_printf("EXPO DATA = \n");   printExpData(&xrsamodexp,publexp_st);
    xil_printf("MODU DATA = \n");   printModData(&xrsamodexp,modulus_st);

    // Set mode to encrypt
    XWsrsa1024_Set_mode(&xrsamodexp,ENCRYPT);

    // Start hardare block
    XWsrsa1024_Start(&xrsamodexp);
    // wait for result
    while( !XWsrsa1024_IsDone(&xrsamodexp));

    // read back data into local buffer
    ciphertext_st = XWsrsa1024_Get_result_V(&xrsamodexp);

    // copy local struct data into user buffer
    memcpy(ciphertext, &ciphertext_st, sizeof(XWsrsa1024_Result_v));

    // compare result against golden truth data, and fail if its wrong
    if (memcmp(ciphertext, golden_ans, sizeof(XWsrsa1024_Result_v)) )
    {
        printf("ERROR, CIPHERTEXT IS INCORRECT\n");
        return XST_FAILURE;
    }
    else
        return XST_SUCCESS;
}

int32_t rsa_test(void)
{
    uint8_t result[RSA_NUM_BYTES];
    // initialize RSA block
    uint32_t ret = rsa_init();
    if (ret != XST_SUCCESS)
            xil_printf("RSA init error!\n");

    // Test public encyption on plaintext, comparing against known ciphertext
    ret = wsrsa_encrypt(plaintext_golden_ans, publexp_arr, modulus_arr, result);
    xil_printf("Enc result = "); printHex(result,RSA_NUM_BYTES);
        return ret;
}

And manually writing to the input registers using devmem2:

#!/bin/sh

# Set mode to 0
devmem2 0x43c00010 b 0

# WRITE BASE WORD-BY-WORD
devmem2 0x43C00018 w 0x726C6421
devmem2 0x43C0001C w 0x2C20576F
devmem2 0x43C00020 w 0x656C6C6F
devmem2 0x43C00024 w 0x00000048
devmem2 0x43C00028 w 0x00000000
devmem2 0x43C0002C w 0x00000000
devmem2 0x43C00084 w 0x00000000
# ....
#  zero writes through 0x43C00094 ....
# ....
devmem2 0x43C00094 w 0x00000000


# WRITE EXPONENT WORD-BY-WORD
devmem2 0x43C0009C w 0x00010001
devmem2 0x43C000A0 w 0x00000000
devmem2 0x43C000A4 w 0x00000000
# ....
#  zero writes through 0x43C00118 ....
# ....
devmem2 0x43C00118 w 0x00000000


# WRITE MODULUS WORD-BY-WORD
devmem2 0x43C00120 w 0x73EBF549
devmem2 0x43C00124 w 0xEB9C825B
devmem2 0x43C00128 w 0x74AFC24B
devmem2 0x43C0012C w 0xA8382964
devmem2 0x43C00130 w 0x77A47EAF
devmem2 0x43C00134 w 0xB6799CBA
devmem2 0x43C00138 w 0xBC655E9B
devmem2 0x43C0013C w 0x3E8474BA
devmem2 0x43C00140 w 0xD45CBF84
devmem2 0x43C00144 w 0xD4ECF4D1
devmem2 0x43C00148 w 0x9BC63D83
devmem2 0x43C0014C w 0x2F5C527B
devmem2 0x43C00150 w 0x216D7925
devmem2 0x43C00154 w 0x7A31B379
devmem2 0x43C00158 w 0xB9B1AD0D
devmem2 0x43C0015C w 0x3DE55FDC
devmem2 0x43C00160 w 0xFBF62113
devmem2 0x43C00164 w 0xB9FB1A97
devmem2 0x43C00168 w 0x0F264D7F
devmem2 0x43C0016C w 0xEAEA3710
devmem2 0x43C00170 w 0x79A497EC
devmem2 0x43C00174 w 0x3362FB37
devmem2 0x43C00178 w 0xC428B39E
devmem2 0x43C0017C w 0x94A68A30
devmem2 0x43C00180 w 0xDF0D9F9A
devmem2 0x43C00184 w 0x1FB4F5E2
devmem2 0x43C00188 w 0x6FE14F25
devmem2 0x43C0018C w 0xBF82BF35
devmem2 0x43C00190 w 0x15A0A2E6
devmem2 0x43C00194 w 0x9769A180
devmem2 0x43C00198 w 0x88853DD8
devmem2 0x43C0019C w 0xD94D889E


# disable autorestart
devmem2 0x43C00000 b 0

# start block
devmem2 0x43C00000 b 1

# read back the results to the console
devmem2 0x43C001A4 w
devmem2 0x43C001A8 w
devmem2 0x43C001AC w
devmem2 0x43C001B0 w
devmem2 0x43C001B4 w
devmem2 0x43C001B8 w
devmem2 0x43C001BC w
devmem2 0x43C001C0 w
devmem2 0x43C001C4 w
devmem2 0x43C001C8 w
devmem2 0x43C001CC w
devmem2 0x43C001D0 w
devmem2 0x43C001D4 w
devmem2 0x43C001D8 w
devmem2 0x43C001DC w
devmem2 0x43C001E0 w
devmem2 0x43C001E4 w
devmem2 0x43C001E8 w
devmem2 0x43C001EC w
devmem2 0x43C001F0 w
devmem2 0x43C001F4 w
devmem2 0x43C001F8 w
devmem2 0x43C001FC w
devmem2 0x43C00200 w
devmem2 0x43C00204 w
devmem2 0x43C00208 w
devmem2 0x43C0020C w
devmem2 0x43C00210 w
devmem2 0x43C00214 w
devmem2 0x43C00218 w
devmem2 0x43C0021C w
devmem2 0x43C00220 w

Every time I call the devmem2 script, the results are different. But I can loop my bare metal test program, constantly setting ap_start, and the result never changes (as it shouldnt).

I have absolutely no idea how to debug this any further, so any and all help appreciated,

Brett

linux
xilinx
vivado
zynq
vivado-hls
asked on Stack Overflow Aug 21, 2017 by Brett • edited Aug 21, 2017 by Brett

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0