Accelerating std::copy_if using SIMD

https://lobste.rs/rss Hits: 8
Summary

Introduction#I have a Zen 4 CPU with a bunch of AVX512 feature flags. So I thought - let’s try and use it to implement something, even if it’s in the realm of wheel-reinvention. I started with the following goals.Implement an algorithm that cannot be vectorized by my optimizing compiler, even with a polyhedral loop model.Systematically analyze its performance and answer the questionsIs it as fast as it can be?If not, why? And how can we fix it?Start simple, make it work.Which means that dead simple algorithms like map/transform, reduce, adjacent_difference etc are out, as they are very autovectorizable. Even 2D stencils are out because look at this. So, I settled on std::copy_if.Implementing a SIMD implementation is the easy part. Figuring its perforamnce out ended up being less trivial than I anticipated. I already knew the tools that I will need.Google benchmark for writing microbenchmarkslikwid-bench for determining performance upper bound on my machinellvm-mca for simulating the kernel on its model of Zen 4perf-stat for drill-down performance analysis by counting eventsFrom cppreference, std::copy_if is a dead-simple algorithm.template<class InputIt, class OutputIt, class UnaryPred> OutputIt copy_if(InputIt first, InputIt last, OutputIt d_first, UnaryPred pred) { for (; first != last; ++first) if (pred(*first)) { *d_first = *first; ++d_first; } return d_first; } The codegen is also very clean (compiler explorer link). It is however non-trivial to vectorize because of a loop-carried dependency: the value of d_first in iteration i+1 depends on the value of pred(*first) in iteration i. Let us measure our baseline before we go about vectorizing.These are the dimensions along with we can measure performance.Input size (henceforth n)Choice of predicate functionInput distributionInput entropyThe problem size (1) is trivial to sweep over; varying n results in different interactions with the memory subsystem (caches, hardware prefetchers, DRAM etc).The predicate and dist...

First seen: 2026-05-26 09:31

Last seen: 2026-05-26 16:37