Scalability Isn't a Feature, It's a Tax: Lessons from 20K CCU

April 6, 20267 min read

Scalability Isn't a Feature, It's a Tax: Lessons from 20K CCU | LikhithM

At 11:47 PM on a Friday, we crossed 20,000 concurrent users for the first time.

By 11:52 PM, our tournament bracket service was returning 503s.

The pods weren't down. CPU was at 34%. Memory was fine. The load balancer showed healthy upstreams. Nothing in our dashboards was red. But players couldn't load brackets, match assignments were timing out, and the support queue was filling with angry tournament organizers.

The culprit? A Postgres connection pool we'd never stress-tested under real concurrency. Not CPU. Not memory. Not network. A connection counter hitting its ceiling at the exact worst moment.

That night taught me more about scalability than any book or conference talk. The specific kind of lesson you only get when real money is on the line at midnight.

The Wrong Mental Model

Most engineers think about scalability like this:

More load → add more pods → problem solved.

This works until it doesn't. And when it doesn't, it fails in the most confusing, non-obvious ways.

The model is wrong because it conflates throughput (requests per second) with scalability (the ability to handle growth without redesign). You can have high throughput and terrible scalability. You can have low throughput and near-infinite scalability.

What actually breaks at scale is never what you tested.

The Three Ceilings Nobody Warns You About

After scaling an eSports platform from 200 to 20,000 CCU, I've found three ceilings that silently kill services long before CPU or memory become the bottleneck.

1. Connection Ceilings (Stateful Resources)

Postgres, Redis, your downstream gRPC services — they all have connection limits. Pods are stateless. Connections are not.

When you scale from 5 pods to 50, your connection pool doesn't scale by 10x automatically — it multiplies by 10x. If each pod holds 20 connections and you had 5 pods, you had 100 connections. At 50 pods, you have 1,000. Postgres default max_connections is 100. You just blew past it by 10x.

The fix isn't "increase max_connections." That's duct tape. The actual fix:

PgBouncer in transaction mode in front of Postgres — multiplexes hundreds of application connections onto tens of actual DB connections
RDS Proxy if you're on AWS and Lambda or high-churn pod workloads are involved
Pool sizing formula: connections_per_pod = (db_max_connections / max_pods) * 0.8 — always leave 20% headroom for admin queries and migrations

We went from 1,000 connections at 50 pods down to 30 actual connections via PgBouncer. Tournament bracket queries dropped from 2,400ms p99 to 180ms p99 under the same load.

2. Hot Key Ceilings (Data Skew)

Your cache hit rate is 95%? Great. But what's the hit rate on your top 10 keys?

During a major tournament, every player in the bracket is reading the same match state, the same leaderboard, the same bracket structure — simultaneously. A single Redis key can receive 10,000 reads per second when 20,000 players refresh at the same time.

Redis is single-threaded per shard. One hot key → one thread maxed → all other reads on that shard queue up → latency spikes for everyone, not just the hot key readers.

Solutions, in order of complexity:

Read replicas — route read traffic to replicas, writes to primary
Local in-process LRU cache (500ms TTL) — absorb the read storm before it hits Redis
Key sharding — bracket:{id}:shard:{hash(userId) % N} — spread load across N keys

We went with local in-process cache: an LRU per pod, 500ms TTL, max 1,000 entries. Bracket read latency at 20K CCU stabilized at 12ms p99. Redis CPU dropped 60%.

3. Coordination Ceilings (Distributed State)

This is the hardest. It's not a performance problem — it's a correctness problem that only manifests at scale.

When two players' results arrive simultaneously for the same match, who wins the write? When a bracket update races against a leaderboard recalculation, which one is authoritative?

At 100 CCU, these races are theoretical. At 20,000 CCU, they happen 50 times per minute.

Patterns that work:

Optimistic locking with version columns:

bytes

UPDATE matches
SET score = $1, version = version + 1
WHERE id = $2 AND version = $3

If the version doesn't match, the write is rejected and the caller retries. Zero locks held, zero deadlock risk.

Event sourcing for tournament state: never update in place, always append. The state is derived from the event log. No two events can conflict — ordering is determined by the log sequence.

Idempotency keys on every state mutation that can be retried. This is non-negotiable. A retry without idempotency at 20K CCU will create duplicate match results, duplicate transactions, and duplicate entries your data team will never fully trust again.

What "Scalable" Actually Means in Practice

Here's the heuristic I use now:

A system is scalable if doubling the load requires less than doubling the resources, and requires no redesign.

Note: no redesign. Not "no tuning." Tuning is expected. Redesign is failure.

Using this definition:

System	Scalable?
Stateless HTTP service behind a load balancer	✓
Postgres with PgBouncer + read replicas	✓
Singleton in-memory rate limiter per pod	✗ state doesn't share across pods
Direct DB calls from Lambda without connection pooling	✗ connection ceiling
Kafka consumers with idempotent handlers	✓
Global distributed lock for every write	✗ lock contention at scale

The Observability You Need Before You Need It

You won't see these ceilings coming unless you're measuring the right things. Most teams aren't.

Add these metrics before your next traffic spike:

bytes

# Connection pool saturation — would have saved us 40 minutes at midnight
pg_pool_connections_active / pg_pool_connections_max

# Redis key access rate — catches hot key problems before they become incidents
sum(rate(redis_commands_total[5m])) by (key_prefix)

# gRPC deadline exceeded rate — surfaces distributed timeout chains
sum(rate(grpc_server_handled_total{code="DeadlineExceeded"}[5m])) by (grpc_method)

# Database query p99 — establish your baseline now, not during an incident
histogram_quantile(0.99,
  sum(rate(pg_query_duration_seconds_bucket[5m])) by (le, query_type)
)

When our bracket service started returning 503s, we had none of these. We flew blind for 40 minutes before identifying the connection pool. With these four queries on a Grafana dashboard, it would have been 3 minutes.

What to Do Monday Morning

If you're running a service that will face 10x current load in the next year, do these three things this week:

1. Audit every stateful resource connection. Count pods × connections_per_pod. Compare against the resource's max_connections. If you're above 60% at current load, fix it now, not when you're at 90%.

2. Add a hot key detector. Log every cache key access. Run a frequency count over the last hour. Any key hit more than 1,000 times per minute needs a local cache layer in front of it.

3. Add version columns to every table with concurrent writers. One migration, run in a transaction, zero downtime. This buys you optimistic locking for free and gives you a foundation to build idempotency on.

These aren't architectural overhauls. They're 1–3 day tasks each. And they're the difference between a Friday night that's boring and one that isn't.

The 20K CCU night was expensive — in stress, in lost player trust, in two engineers debugging until 3 AM. But the system we built afterward handled 30K CCU at the next major tournament without a single alert firing.

The tax was worth paying. It's always better to pay it in preparation than in production.

#The Wrong Mental Model

#The Three Ceilings Nobody Warns You About

#1. Connection Ceilings (Stateful Resources)

#2. Hot Key Ceilings (Data Skew)

#3. Coordination Ceilings (Distributed State)

#What "Scalable" Actually Means in Practice

#The Observability You Need Before You Need It

#What to Do Monday Morning