Hello, and thanks in advance for reading. This is not a bug report. It is a case study write up of a recovery effort on a severely corrupted 12 TB multi-device pool, shared here in case any of the observations are useful to btrfs-progs development. The goal is constructive, not a complaint. One paragraph summary A hard power cycle on a 3 device pool (data single, metadata DUP, DM-SMR disks) left the extent tree and free space tree in a state that no native repair path could resolve. A subsequent btrfs check --repair run entered an infinite loop of 46,000+ commits with zero net progress, rotating the 4 backup_roots slots past any pre-crash rollback point. Recovery eventually succeeded through a set of 14 custom C tools built against the internal btrfs-progs API, with a final data loss of about 7.2 MB out of 4.59 TB (0.00016 percent). The pool is now fully operational. Full analysis I wrote the case up in a structured way that covers environment, timeline, root cause classification, the bulletproof safety criterion we derived empirically, and 9 specific areas where a relatively small upstream change would have prevented the need for most of the custom tooling. https://github.com/msedek/btrfs_fixes/blob/main/INCIDENT-ANALYSIS.md The nine proposed improvement areas, in order of expected impact on operators hitting similar cases: A. Progress detection in btrfs check --repair so 46,000 commit loops abort with a clear message instead of destroying backup_roots. B. Symmetric handling of BTRFS_ADD_DELAYED_REF in reinit_extent_tree, matching the existing BTRFS_DROP_DELAYED_REF exemption. C. Sibling safety precheck in btrfs_del_items rebalance so a drain below LEAF_DATA_SIZE/4 does not trigger push_leaf_left on a stale sharable sibling. D. Supervised EEXIST handling in alloc_reserved_tree_block with three explicit modes (error, silent, update). E. A btrfs rescue rebuild-extent-tree subcommand that operates from a pre-scanned ref list, as an alternative to the currently deadloc...
First seen: 2026-04-06 04:44
Last seen: 2026-04-06 05:44