1% vs. 67%: What happened when we stopped trusting embeddings alone

https://news.ycombinator.com/rss Hits: 2
Summary

Chroma's research last July confirms what we've been building around: bigger context windows don't solve the retrieval problem. They tested 18 LLMs including Claude, GPT-4.1, Gemini 2.5. Performance degrades regardless of how much you can fit in. Here's the part nobody talks about. Retrieval and generation are decoupled. The retriever finds "relevant" chunks. The generator uses them. But nothing connects what got retrieved to whether the answer actually helped. Your AI pulls up a memory. Uses it. Gets it wrong. And then... nothing. That memory sits there, waiting to surface again. Same confidence. Same ranking. No feedback. Where's the feedback loop? The Learning Gap Rerankers, query rewriting, hybrid search - they all try to fix retrieval quality at query time. But they can't learn from outcomes. They optimize for similarity, not success. We built outcome-based learning. When you say "that worked" or "no, that's wrong," those signals attach to the memories themselves. Good memories surface more. Bad ones sink. It's not complicated. It's just not how anyone else builds this. Three Problems We Had to Solve Problem 1: Cold Start A new memory helps once. 1/1 = 100% success rate. A veteran memory has helped 90 times out of 100. 90/100 = 90%. Raw math says the new one is better. That's insane. Wilson score fixes this by asking: how much do I actually trust this number? One data point? Could be luck. A hundred data points? That's a pattern. So 9/10 and 90/100 are both 90% raw, but Wilson scores them at ~60% and ~83% respectively. More evidence, higher floor. Memories have to prove themselves. Problem 2: When to Trust What New memories have no track record. You can't rank them by outcome because there's no outcome yet. But if you only trust embeddings, you're back to semantic similarity. No learning. We use dynamic weighting. As memories get used and scored, the balance shifts: Embedding similarity outcome-based learning New 80% 20% feedback Proven 20% 80% Trust is earned,...

First seen: 2026-01-12 07:00

Last seen: 2026-01-12 08:00