Here’s a not-quite-as-cool-as-ConcurrentQueue-but-still-cool shared pointer I wrote:
Sometime during my work on the queue I realized that garbage collection would be a nightmare, as the ownership of the buffers is essentially shared between one producer and all consumers. I felt that figuring out when it was ok to destroy a buffer would require syncronization mechanisms that interfered with the general pop/push performance.
I had some hopes though, that there would exist a concurrent shared pointer in the std library that could solve my problem without being a performance burden. After some searching I found lots of references and half-implementations of it, but when it came down to it, however, it was just experimental bits & pieces that weren’t avaliable for use (As I understand, in C++20, there will be a thread safe shared pointer in std, and there were, at some point, a working experimental implementation in the code).
So. I figured, why not build my own. I tried once and gave up because I couldn’t figure out how to do the increments & decrements safely & in a lock free manner. However when time came for me to do my Specialization project at The Game Assembly, I wanted to give it another shot.
As I began I had two criteria for the finished product: It would be lock free, and access to the shared object would have near-same performance as a simple raw pointer.
As I began thinking about the lock free control mechanism for copying a pointer object I realized something I had half-learned in my previous attempt: I could not allow a thread access to the shared block haphazardly, as there would be no way to know when another thread were already in there decrementing it (potentially to zero). That meant that whatever control mechanism I invented, I had to put in the pointer objects themselves, locally.
My first breakthrough in the control mechanism was the realization that so long as the source of a copy operation hadn’t decremented it’s corresponding shared block, that shared block (and shared object) would be alive. This meant that if I could somehow make the source object be alive until the increments had happened, I would be in the clear. This spawned a lot of my first Ideas/drafts:
Store a sort of client list in the upper word of a pointer
Which would then be incremented on copy to be a sort of promise from the from object to increment the shared block ref counter on the client’s behalf.
This failed of course as there was no telling when the actual incrementation would happen, and a client might choose to decrement the shared block before its corresponding increment (potentially killing the shared block ahead of time).
The second breakthrough I had came when I was reading a paper on an implementation of a lock free arbitrary length word. The idea was that lock free-ness can be guaranteed if all threads involved in an operation would have to help out / try to succeed in whatever was going on. This made me realize that I was sort of on the right track, but I would have to include not only the from-object’s user threads, but the to-object’s users as well. The solution: Make the to-object’s assigner help out with increments
As well as the from object..
That’s basically the main idea of the mechanism. In short: Increment the COPY_REQUEST iterator which forces whichever thread wants to store a new value in the pointer object to help increment the shared block ref counter, as well as help out with the incrementation from the Copy-requester.
Fun bonus. To make this mechanism work I needed to have more storage space than the just 16 bits extra on top of a 64 bit pointer block. This led me to investigate Microsoft’s Interlocked operations, which in fact does support one atomic operation on 128 bits: _InterlockedCompareExchange128. This uses the underlying ‘lock cmpxchg16b’ instruction, which is the only widely avaliable 128bit atomic instruction. This meant I had to wrap a whole bunch of operations around it. It resulted in the AtomicOWord class. Quite the utility.