Multi-threaded reference counting


I was just thinking about multi-threaded reference counting, searched for it and found many posts, that basicly only mention the problem of atomicity, many answers even here on stackoverflow miss the actual problems involved in multi-threaded reference counting. So what's the fundamental problem.

Let's assume an object type with a reference counter like

struct SomethingRefcounted {
    int refcount;
    // other stuff
} * sr;

Reference counting means, our reference counter shall equal the number of references to the object at all times, possibly being slightly higher temporarily during pointer assignments.

For all further code, I assume volatile and atomic operations.

Now, each time, we create a new reference, we do an implicit ++sr->refcount; each time we remove the reference, we do an implicit if (!--sr->refcount) free(sr);.

However, if one thread does the decrement and another thread tries the increment at the same time, we get a race, with can only be understood considering CPU registers.

SomethingRefcounted * sr1, * sr2;

int thread_1() {
    for (;;) sr1 = NULL;

int thread_2() {
    for (;;) sr2 = sr1;

int thread_3() {
    for (;;) sr1 = new SomethingRefcounted;

int main() {
    create the threads and let them run

The problem here is thread_2: The moment it reads the pointer sr1 into a CPU register, it violates the assumption, the refcounter correctly counts the number of references. The refcounter is 1 after the new assignment to sr1 but even ignoring thread_1 for a moment, once sr1 is read into a CPU register by thread_2, there are 2 reference to the object, first the variable sr1 and the CPU register of thread_2, but the refcounter is still only 1, violating our constraint from above. The following increment will fix it, but this fix will come too late, if thread_1 decrements it to 0 before.

There are solutions involving locks (global locks, maybe hashed for many objects sharing one lock out of a pool of locks, so not every object needs it's own lock but also there doesn't need to be a single application wide lock causing lock contention). One solution I came up with however is to xchg the pointer on read, so the constraint, refcounter >= number of references is enforced. thread_2 in assembly could then look like this:

    LOAD 0xdeadbeef -> reg1
    XCHG [sr1] <-> reg1
    CMP  reg1 <-> 0xdeadbeef
    INC [reg1]
    STORE reg1 -> [sr1]
    STORE reg1 -> [sr2] ; Actually, here, we have to do the
    ; refcount decrement on sr2 first, but you get the point

So, this is a simple spinlock, waiting for other concurrently running accesses like this to complete. Once we successfully XCHGed the pointer, no one else can get it, so we can be sure, the ref counter is at least 1 (it can't be 0, because we just found a reference to it) and it can't be decremented down to zero (even if there are more references, the one we have now, we have it exclusively in your CPU register, and that one contributes 1 to the refcounter, preventing it from reaching zero).

Similarily, thread_1 would look like this:

    LOAD 0xdeadbeef -> reg1
    XCHG [sr1] <-> reg1
    CMP  reg1 <-> 0xdeadbeef
    LOAD 0 -> reg2
    STORE reg2 -> [sr1]
    DEC [reg1]
    JUMPIFZERO free_it

Now, I am wondering if there are any more efficient solution to that problem. (Or even whether I miss something here).

asked on Stack Overflow Jul 2, 2020 by Bodo Thiesen

0 Answers

Nobody has answered this question yet.

User contributions licensed under CC BY-SA 3.0