How to declare 16-bits pointer to string in GCC C compiler for arm processor

0

I tried to declare an array of short pointers to strings (16-bits instead of default 32-bits) in GNU GCC C compiler for ARM Cortex-M0 processor to reduce flash consumption. I have about 200 strings in two language, so reducing the size of pointer from 32-bits to 16-bits could save 800 bytes of flash. It should be possible because the flash size is less than 64 kB so the high word (16-bits) of pointers to flash is constans and equal to 0x0800:

const unsigned char str1[] ="First string";
const unsigned char str2[] ="Second string";
const unsigned short ptrs[] = {&str1, &str2};    //this line generate error

but i got error in 3-th line

"error: initializer element is not computable at load time"

Then i tried:

const unsigned short ptr1 = (&str1 & 0xFFFF);

and i got: "error: invalid operands to binary & (have 'const unsigned char (*)[11]' and 'int')"

After many attempts i ended up in assembly:

  .section .rodata.strings
  .align 2
ptr0:
ptr3:   .short (str3-str0)
ptr4:   .short (str4-str0)

str0:
str3:   .asciz  "3-th string"
str4:   .asciz  "4-th string"

compilation pass well, but now i have problem trying to reference pointers: ptr4 and ptr0 from C code. Trying to pass "ptr4-ptr0" as an 8-bit argument to C function:

ptr = getStringFromTable (ptr4-ptr0)

declared as:

const unsigned char* getStringFromTable (unsigned char stringIndex)

i got wrong code like this:

ldr     r3, [pc, #28]   ; (0x8000a78 <main+164>)
ldrb    r1, [r3, #0]
ldr     r3, [pc, #28]   ; (0x8000a7c <main+168>)
ldrb    r3, [r3, #0]
subs    r1, r1, r3
uxtb    r1, r1
bl      0x8000692 <getStringFromTable>

instead of something like this:

movs    r0, #2
bl      0x8000692 <getStringFromTable>

I would be grateful for any suggestion.

.....after a few days.....

Following @TonyK and @old_timer advices i finally solved the problem in the following way: in assembly i wrote:

  .global str0,  ptr0
  .section .rodata.strings
  .align 2
ptr0:   .short (str3-str0)
        .short (str4-str0)

str0:
str3:   .asciz  "3-th string"
str4:   .asciz  "4-th string"

then i declared in C:

extern unsigned short ptr0[];
extern const unsigned char str0[] ;

enum ptrs {ptr3, ptr4};        //automatically: ptr3=0, ptr4=1

const unsigned char* getStringFromTable (enum ptrs index)
  {
  return &str0[ptr0[index]] ;
  }

and now this text:

ptr = getStringFromTable (ptr4)

is compiled to the correct code:

08000988: 0x00000120   movs    r0, #1
0800098a: 0xfff745ff   bl      0x8000818 <getStringFromTable>

i just have to remember to keep the order of enum ptrs each time i will add a string to the assembly and a new item to enum ptrs

string
pointers
gcc
arm
16-bit
asked on Stack Overflow May 12, 2020 by Miroslaw • edited May 20, 2020 by Miroslaw

2 Answers

1

Declare ptr0 and str0 as .global in your assembly language file. Then in C:

extern unsigned short ptr0[] ;
extern const char str0[] ;
const char* getStringFromTable (unsigned char index)
  {
  return &str0[ptr0[index]] ;
  }

This works as long as the total size of the str0 table is less than 64K.

answered on Stack Overflow May 13, 2020 by TonyK
0

A pointer is an address and addresses in arm cannot be 16 bits that makes no sense, other than Acorn based arms (24 bit if I remember right), addresses are minimum 32 bits (for arm) and going into aarch64 larger but never smaller.

This

ptr3:   .short (str3-str0)

does not produce an address (so it cant be a pointer) it produces an offset that is only usable when you add it to the base address str0.

You cannot generate 16 bit addresses (in a debugged/usable arm compiler), but since everything appears to be static here (const/rodata) that makes it even easier solve, solvable runtime as well, but even simpler pre-computed based on information provided thus far.

const unsigned char str1[] ="First string";
const unsigned char str2[] ="Second string";
const unsigned char str3[] ="Third string";

brute force takes like 30 lines of code to produce the header file below, much less if you try to compact it although ad-hoc programs don't need to be pretty.

This output which is intentionally long to demonstrate the solution (and to be able to visually check the tool) but the compiler doesn't care (so best to make it long and verbose for readability/validation purposes):

mystrings.h

const unsigned char strs[39]=
{
  0x46, //  0 F
  0x69, //  1 i
  0x72, //  2 r
  0x73, //  3 s
  0x74, //  4 t
  0x20, //  5  
  0x73, //  6 s
  0x74, //  7 t
  0x72, //  8 r
  0x69, //  9 i
  0x6E, // 10 n
  0x67, // 11 g
  0x00, // 12 
  0x53, // 13 S
  0x65, // 14 e
  0x63, // 15 c
  0x6F, // 16 o
  0x6E, // 17 n
  0x64, // 18 d
  0x20, // 19  
  0x73, // 20 s
  0x74, // 21 t
  0x72, // 22 r
  0x69, // 23 i
  0x6E, // 24 n
  0x00, // 25 
  0x54, // 26 T
  0x68, // 27 h
  0x69, // 28 i
  0x72, // 29 r
  0x64, // 30 d
  0x20, // 31  
  0x73, // 32 s
  0x74, // 33 t
  0x72, // 34 r
  0x69, // 35 i
  0x6E, // 36 n
  0x67, // 37 g
  0x00, // 38 
};
const unsigned short ptrs[3]=
{
  0x0000 //  0   0
  0x000D //  1  13
  0x001A //  2  26
};

The compiler then handles all of the address generation when you use it

&strs[ptrs[n]]

depending on how you write your tool can even have things like

#define FIRST_STRING 0
#define SECOND_STRING 1

and so on so that your code could find the string with

strs[ptrs[SECOND_STRING]]

making the program that much more readable. All auto generated from an ad-hoc tool that does this offset work for you.

the main() part of the tool could look like

add_string(FIRST_STRING,"First string");
add_string(SECOND_STRING,"Second string");
add_string(THIRD_STRING,"Third string");

with that function and some more code to dump the result.

and then you simply include the generated output and use the

strs[ptrs[THIRD_STRING]] 

type syntax in the real application.

In order to continue down the path you started, if that is what you prefer (looks like more work but is still pretty quick to code).

ptr0:
ptr3:   .short (str3-str0)
ptr4:   .short (str4-str0)

str0:
str3:   .asciz  "3-th string"
str4:   .asciz  "4-th string"

Then you need to export str0 and ptr3, ptr4 (as needed depending on your assembler's assembly language) then access them as a pointer to str0+ptr3

extern unsigned int str0;
extern unsigned short ptr3;
...
... *((unsigned char *)(str0+ptr3))

fixing whatever syntax mistakes I intentionally or unintentionally added to that pseudo code.

That would work as well and you would have the one base address then the hundreds of 16 bit offsets to that address.

could even do some flavor of

const unsigned short ptrs[]={ptr0,ptr1,ptr2,ptr3};
...
(unsigned char *)(str0+ptrs[n])

using some flavor of C syntax to create that array but probably not worth that extra effort...

The solution a few of us have mentioned thus far (one example demonstrated above)(16 bit offsets which are NOT addresses which means NOT pointers) is much easier to code and maintain and use and maybe read depending on your implementation. However implemented it requires a full sized base address and offsets. It might be possible to code this in C without an ad-hoc tool, but the ad-hoc tool literally only takes a few minutes to write.

I write programs to write programs or programs to compress/manipulate data almost daily, why not. Compression is a good example of this want to embed a black and white image into your resource limited mcu flash? Don't put all the pixels in the binary, start with a run length encoding and go from there, which means a third party tool written by you or not that converts the real data into a structure that fits, same thing here a third party tool that prepares/compresses the data for the application. This problem is really just another compression algorithm since you are trying to reduce the amount of data without losing any.

Also note depending on what these strings are if it is possible to have duplicates or fractions the tool could be even smarter:

const unsigned char str1[] ="First string";
const unsigned char str2[] ="Second string";
const unsigned char str3[] ="Third string";
const unsigned char str4[] ="string";
const unsigned char str5[] ="Third string";

creating

const unsigned char strs[39]=
{
  0x46, //  0 F
  0x69, //  1 i
  0x72, //  2 r
  0x73, //  3 s
  0x74, //  4 t
  0x20, //  5  
  0x73, //  6 s
  0x74, //  7 t
  0x72, //  8 r
  0x69, //  9 i
  0x6E, // 10 n
  0x67, // 11 g
  0x00, // 12 
  0x53, // 13 S
  0x65, // 14 e
  0x63, // 15 c
  0x6F, // 16 o
  0x6E, // 17 n
  0x64, // 18 d
  0x20, // 19  
  0x73, // 20 s
  0x74, // 21 t
  0x72, // 22 r
  0x69, // 23 i
  0x6E, // 24 n
  0x00, // 25 
  0x54, // 26 T
  0x68, // 27 h
  0x69, // 28 i
  0x72, // 29 r
  0x64, // 30 d
  0x20, // 31  
  0x73, // 32 s
  0x74, // 33 t
  0x72, // 34 r
  0x69, // 35 i
  0x6E, // 36 n
  0x67, // 37 g
  0x00, // 38 
};
const unsigned short ptrs[5]=
{
  0x0000 //  0   0
  0x000D //  1  13
  0x001A //  2  26
  0x0006 //  3   6
  0x001A //  4  26
};
answered on Stack Overflow May 13, 2020 by old_timer • edited May 22, 2020 by halfer

User contributions licensed under CC BY-SA 3.0