Tue, Feb 07, 2023 at 12:54:16AM +0100, Udo Richter wrote:
Two-ended buffers are pretty good when used correctly, but nowadays they have a small chance of triggering memory ordering issues, where it is possible that written data to the buffer is still stuck in a distant cache, while the updated head pointer is already known to other CPU cores. To be absolutely safe, the head pointer would need atomic read/write, forcing ordering with a memory barrier.
A nice explanation. Before C++11 (ISO/IEC 14882:2011), to my understanding, multi-threaded programs or data races were not part of the standard. Starting with C++11, data races actually are undefined behavior, and not just "implementation defined". I would recommend the following resources:
https://en.cppreference.com/w/cpp/atomic/memory_order https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
The latter only covers loads and stores, not read-modify-write like std::atomic::fetch_add(), but it gives an idea how things look like in different processors.
But in general the timing for such a bug is pretty narrow, and advanced CPUs (esp. intel/amd) make sure that things get stored in order anyway.
With the IA-32 and AMD64 a.k.a. x86-64 ISA, the general expectation is that implementations use Total Store Order (TSO). This is also how the SPARC and the RISC-V RVTSO works.
However, on ARM (and POWER and RISC-V RVWMO), weak memory ordering is the norm. On the ARM, some atomic operations would use retry loops, or with the recent LSE (Large System Extensions), special instructions. It should be remembered that there are VDR systems running on ARM, like the Raspberry Pi.
Worst thing would also just be false data read, not a crash.
If that false data were a stale pointer that points to now-freed memory, you could get a crash. Or you could get a hang or an assertion failure catching an inconsistent state.
Somewhat related to this, I recently learned about Linux "restartable sequences" https://lwn.net/Articles/883104/ which could be an alternative to using mutexes in some lock-free data structures. I think it could work for things where data would be "published" by updating just one pointer at the end. I didn't think of an application of that in VDR yet, and I am not sure if there is any. It is better to keep things simple if the performance is acceptable.
Marko