Two Studies in Compiler Optimisations

https://news.ycombinator.com/rss Hits: 38
Summary

Table of contents Introduction While many performance-oriented programmers are intimately acquainted with the almost preternatural ability of modern compilers to optimise their code, and many of us have spent countless hours on Compiler Explorer examining the differences between the Assembly generated by different versions of gcc and clang, most have likely not looked under the hood to see how the magic happens. It is a testament to their quality that most of us simply treat compilers as black boxes: more or less readable code goes in, fast binaries come out. Sometimes, however, seemingly innocuous changes—perhaps even meant to help the compiler—can cause surprising performance issues which we are hard-pressed to explain without a deeper understanding of the underlying machinery. In this post we’ll dive into the implementation of some of the LLVM optimisation passes using two simple examples which, nonetheless, will help us pull back the veil on the complexity involved in producing highly-optimised code. We will see how small source changes can trigger different paths in the compiler’s internal processing with unexpected consequences, demonstrating how achieving high performance can be as much an art as it is a science for both compiler developers and users. I have also included a few exercises for those interested in getting their hands dirty, but they are not required to follow along with the main text. I use LLVM 22.1.0 as the reference implementation throughout this post. The examples are written in (very basic) C++23 and target x86-64, and the Assembly code uses Intel syntax. Prior knowledge of LLVM IR is not required but it can be helpful (I recommend A Gentle Introduction to LLVM IR). Case 1: Modular increment The scenario Consider the following C++ function to get the next index into an array or vector of elements accessed in a round-robin fashion, with cur being the current index and count the number of elements: unsigned next_naive(unsigned cur, unsigned c...

First seen: 2026-03-26 01:57

Last seen: 2026-03-27 13:26