Taming P99s in OpenFGA: How we built a self-tuning strategy planner

https://news.ycombinator.com/rss Hits: 2
Summary

Operating a latency-critical system means the inevitable work of reducing tail latency. Tail latency refers to the response time experienced by the slowest requests (the outliers), rather than the average. ​​Since authorization happens on every request, these decisions must be fast; otherwise, they directly add overhead to the total response time. For OpenFGA, an open-source authorization system modeled after Google's Zanzibar, that powers up Auth0 FGA, this challenge manifests in its most critical operation: Check. Answering "Can user X access resource Y?" requires traversing relationship graphs. In this context, traversal performance isn't just a feature; it is the fundamental constraint of the system's architecture.In our quest to reduce latency for the Check API, we initially developed multiple graph traversal strategies tailored to specific data distributions. Our early iterations selected these strategies statically based on the graph node’s complexity, lacking the context to determine whether a specific strategy would actually outperform the default traversal algorithm for a given dataset.We needed a way to consistently select the optimal path based on performance data, not just static rules. This led to the development of a dynamic, self-tuning planner that learns from production latency in real-time. Because every node in a customer’s graph possesses unique complexity—varying by type of operations, operation count, data cardinality, and subgraph distribution—the planner treats each node independently, applying the most effective strategy for that specific point in the traversal.This post details the algorithmic framework chosen for the self-tuning planner and the methodology used to calibrate the probabilistic distributions for each traversal strategy. We will examine how this architecture creates an extensible feedback loop, allowing us to continuously inject new, pre-tuned strategies into the decision engine (the planner) improving even more the performan...

First seen: 2026-01-22 20:45

Last seen: 2026-01-26 21:59