2026-05-21 Kubernetes In Anger NOTE: Any discussions can be had on Lobsters 0. Quick start (emergency edition) Is this the right guide? YES, if: You’re debugging a live EKS production issue You need to upgrade/change EKS safely You want to prevent common EKS outages You’re oncall for EKS workloads NO, if: You’re learning Kubernetes basics (try the official tutorials first) You need EKS setup instructions (use AWS documentation) You want comprehensive Kubernetes reference (use kubernetes.io) Emergency shortcuts Cluster is on fire right now? → Jump to Section 2.10 Tier-0 Incident Playbook Need to upgrade safely? → Jump to Section 8 Upgrades and maintenance Investigating an incident? → Start with Section 1.2 Quick Cluster Health Snapshot Prerequisites This guide assumes you know: Basic kubectl commands (get, describe, logs) AWS CLI basics What pods, services, and deployments are How to read YAML manifests What makes EKS different EKS is not “just Kubernetes”. Key differences that affect reliability: Pods get real VPC IPs (AWS VPC CNI) AWS services become dependencies (NAT, NLB, EBS) Node limits are AWS EC2 limits Networking failures look like application failures Upgrades affect multiple AWS components Introduction On running infrastructure There’s a common way of thinking about Kubernetes that goes something like this: you declare what you want, the system converges toward it, and your job is mostly done. Write the YAML, apply it, the scheduler places your pods, the controllers reconcile state, and everything just works. This is roughly true until it isn’t. The thing about Kubernetes — and EKS specifically — is that it doesn’t fail like a monolith fails. A monolith crashes and you know it. EKS degrades. DNS gets slow. A node hits a network limit you didn’t know existed. Pods keep running but their connections reset every 6 minutes. The dashboard is green. Customers are complaining. You’re staring at healthy pods wondering what’s wrong with your application, when the r...
First seen: 2026-05-21 10:56
Last seen: 2026-05-22 20:25