High Performance Bare Metal Abstraction


Recently we got an idea to create generic high performance abstractions for bare metal development with usage of the templates.

Usually every chip manufacture provides C header like this:

//Following structure is POD so we can rely on its memory layout
struct Periphery{
  volatile uint32_t reg1;
  volatile uint32_t reg2;

#define PERIPHERY0BASE 0x000000ab //address where does registers of periphery start
#define PERIPHERY1BASE 0x000000cd
static Periphery* PERIPHERY0 = (Periphery*)(PERIPHERY0BASE ); 
#define PERIPHERY1 (Periphery*)(PERIPHERY1BASE )

Our idea was to then create drivers which are platforms specific but generic for given periphery type:

template<int addr>
  static inline void doSomething(int foo){
    (PERIPHERY0*)(addr)->reg1 = foo;
 //typedefs for different peripheries of the same type 
typedef Periphery<PERIPHERY0BASE> Periphery0;
typedef Periphery<PERIPHERY1BASE> Periphery1;

which would be then used in platform independent module like this:

template<class P> 
class DriverUser{ 

Point of all of this is that we can abstract from single periphery on one platform and thus create generic driver for all peripheries with the same structure e.g Timers, Uarts and so on one processor family. Additionally it allows us to create high performance platform independent modules e.g we could for example create high performance pin access which is as efficient as written in assembler but at the same time highly reusable:

//normally would be in PCB specific header
typedef Pin<Port0, Pin0> Pin0;
typedef Pin<Port1, Pin7> Pin1;

//application specific code
typedef Pin0 TxPin;
typedef Pin1 RxPin;

void main(){
  SoftwareUart<TxPin,RxPin> my_uart(115200);
  my_uart.send("hello world");

It is then possible to implement SoftwareUart which is completely platform independent yet writing High to TxPin would be as efficient as in assembler all this with no use of macros.

Our problem is that on some platforms manufacturer's headers does not contain macros which would define names for addresses but only macros where are addresses already cast to pointers and as such we can't use them as template parameters. e.g PERIPHERY0BASE is not available only PERIHPERY0

My question is if there could be any workaround which would keep the efficiency?(except rewriting register definitions) in C++11 I would think to use constexpr to create function which would obtain address of static structures which could be then used as template parameter. unfortunately we can't count on C++11 availability. Any Ideas? Does we need to modify/write our own register defitions?

asked on Stack Overflow Feb 15, 2016 by Martin Skalský • edited Feb 16, 2016 by Martin Skalský

2 Answers


Sorry, but it's difficult to understand what you actually need. If I understand you correctly you need a generic way to get an offset in specific structure out of a pointer provided by the third party headers (assuming you know the alignment of the structure). If you claim you can achieve your goal with C++11 constexpr functions, try to use templates for C++03.

I suppose you need to introduce a higher level wrapper that converts pointer into offset:

template <typename T, T ptr, unsigned TAlignmentMask>
struct AddrRetriever
    static const int value = (int)ptr & TAlignmentMask;

And then use:

typedef Periphery<
        volatile void*, // use the type of the pointer vendor provides
 > Periphery0;

As a side note, I'd like to recommend reading Practical Guide to Bare Metal C++. It will give you some ideas on implementing generic asynchronous timers, uarts, and other peripherals just like you want.

answered on Stack Overflow Feb 16, 2016 by Alex Robenko

You should always create a hardware abstraction layer around the hardware peripheral. Meaning that the caller shouldn't need to know or care about the bits and bytes of the registers. In other words, make a standard hardware driver for a given peripheral on a given MCU.

To handle multiple hardware peripherals of the same type, on the same chip, you usually take the first register's address as a parameter to keep them apart. It seems this is what your code is doing.

To take the abstraction level further, you can then create an abstract base class "UART", which holds generic functions for all UARTs, such as setup baudrate and communication format, setup hardware handshaking if needed, send, receive and so on. All your UART drivers will then have to inherit the base class function interface.

Then the caller of the application doesn't need to know or care about how a specific hardware peripheral works on the given MCU. The caller application will be completely portable.

This is the standard way to do professional firmware design. Usually it is done in C, but with some care is will be possible to do in C++ as well, without creating too much dead weight (avoid templates).

answered on Stack Overflow Feb 16, 2016 by Lundin

User contributions licensed under CC BY-SA 3.0