LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?

https://news.ycombinator.com/rss Hits: 5

Summary

In Part 1, I described how duplicating a block of seven middle layers in Qwen2-72B — no weight changes, no training — produced the #1 model on the HuggingFace Open LLM Leaderboard. The method, which I called RYS (Repeat Your Self), was discovered using nothing but hard math probes and EQ-Bench on a pair of RTX 4090s.That was mid-2024. Since then, a flood of strong open-source models has arrived — Qwen3.5, MiniMax, GLM-4.7, and others — and I finally have enough compute at home to scan them properly.So the question driving this post is simple: was RYS a fluke of Qwen2-72B, or is it a general property of Transformers?More specifically:Does relayering still help on stronger modern models?Which modifications actually earn their extra layers?If two good motifs help independently, do they stack?The short answer is yes, relayering survives. The longer answer took 3,024 beam search candidates, a surrogate model scoring 2 million configurations, and a unified validation sweep to work out properly. Along the way, I also released the scanning code and a set of new RYS models.Let’s get into it!Why Qwen3.5-27BThe Qwen3.5 family dropped around Chinese New Year 2026 and immediately became the darling of the LocalLLaMA crowd. Strong benchmarks, good vibes, well-engineered.I’m most interested in models over 200B — that’s what my dual Grace-Hopper system is built for — but the broader community runs smaller models, and the 27B size hits a sweet spot: large enough to have interesting internal structure, small enough that most people with a decent GPU can actually use a RYS variant.There’s also a scientific reason. In Part 1, I noted that smaller models tend to have more entangled functional anatomy — encoding, reasoning, and decoding are less cleanly separated. If RYS still works on a 27B model, that tells us the circuit structure is robust even when the brain is more compact. If it doesn’t work, that’s also interesting.(MiniMax M2.5 and others are in the pipeline. The Hopper is grind...

First seen: 2026-03-24 14:30

Last seen: 2026-03-24 18:36

Read Full Article More from this Source

LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?

Summary

Related News

The RISE RISC-V Runners: free, native RISC-V CI on GitHub

Figma's MCP Update Reflects a Larger Industry Shift

Stop Publishing Garbage Data, It's Embarrassing

The bot situation on the internet is worse than you could imagine

Comparison shows audiophiles waste a lot of money