Free Guide

System Design Interview Basics

Core concepts and patterns you need for system design interviews, explained simply.

5 min read

What interviewers are looking for

System design interviews test your ability to think through trade-offs, not memorize architectures. Interviewers evaluate:

Requirements gathering — Can you ask the right clarifying questions?
High-level design — Can you sketch a reasonable architecture?
Deep dives — Can you reason about specific components in detail?
Trade-offs — Can you articulate why you chose one approach over another?

The framework (use this every time)

Step 1: Clarify requirements (3-5 minutes)

Never start designing immediately. Ask:

Functional: What does the system do? What are the core features?
Scale: How many users? How many requests per second?
Constraints: Latency requirements? Consistency requirements?
Non-functional: Availability target? Data retention?

Example: "Design a URL shortener"

Clarifying questions:
- How many URLs shortened per day? → 100M
- How many redirects per day? → 10B (100:1 read/write ratio)
- Max URL length? → 2048 characters
- Custom aliases allowed? → Yes
- Analytics needed? → Click count, basic stats
- Expiration? → Optional, default 5 years

Step 2: Estimate scale (2-3 minutes)

Back-of-envelope math shows you understand the problem’s magnitude:

Writes: 100M / day = ~1200/sec
Reads: 10B / day = ~115,000/sec
Storage: 100M × 500 bytes (avg) = 50 GB/day, 18 TB/year

This tells you: read-heavy, needs caching, moderate write throughput.

Step 3: High-level design (5-10 minutes)

Sketch the major components:

Client → Load Balancer → API Servers → Cache (Redis)
                                    → Database
                                    → Analytics Queue → Analytics Service

Explain each component’s role and why it’s there.

Step 4: Deep dive (15-20 minutes)

Pick 2-3 components and go deep. The interviewer may direct you.

Core concepts

Load Balancing

Distributes traffic across multiple servers.

Algorithms:

Round robin — Simple rotation (works when servers are identical)
Least connections — Routes to the server with fewest active connections
Weighted — Sends more traffic to more powerful servers
IP hash — Same client always hits the same server (useful for sessions)

Layer 4 vs Layer 7:

L4: Routes based on IP/port (faster, less intelligent)
L7: Routes based on HTTP content — URL path, headers, cookies (more flexible)

Caching

Store frequently accessed data closer to the user.

Cache layers (from fastest to slowest):

Browser cache (client)
CDN (edge)
Application cache — Redis/Memcached (server)
Database query cache (database)

Cache strategies:

Cache-aside (lazy): App checks cache → miss → reads DB → writes to cache
Write-through: App writes to cache and DB simultaneously
Write-behind: App writes to cache; cache async writes to DB (risky but fast)

Cache invalidation (the hard problem):

TTL (Time to Live): Key expires after N seconds. Simple, eventually consistent.
Write-through invalidation: Delete cache key on every write. Consistent but more traffic.
Event-based: Publish cache invalidation events when data changes.

Database choices

Need	Choose
Structured data, complex joins, ACID	PostgreSQL
Flexible schema, document-shaped data	MongoDB
Key-value with expiry, caching	Redis
Time-series data, metrics	InfluxDB / TimescaleDB
Search and full-text queries	Elasticsearch
Wide-column, massive scale	Cassandra / DynamoDB

Scaling patterns

Vertical scaling (scale up): Bigger machine. Simple but has a ceiling.

Horizontal scaling (scale out): More machines. Complex but unlimited.

Database scaling:

Read replicas: Route reads to copies, writes to primary
Sharding: Split data across multiple databases by key (e.g., user_id % 4)
Connection pooling: Reuse database connections instead of creating new ones

Message queues

Decouple services so they don’t need to be online simultaneously.

Producer → Queue → Consumer

Use cases:
- Email sending (don't block the API response)
- Image processing (async, can be slow)
- Analytics events (fire and forget)
- Order processing (reliable, retry on failure)

RabbitMQ — Traditional message broker, smart routing, guaranteed delivery Kafka — Event streaming, high throughput, message replay, log-based

Consistency vs Availability (CAP theorem)

You can’t have all three during a network partition:

Consistency: Every read returns the latest write
Availability: Every request gets a response
Partition tolerance: System works despite network failures

In practice, you always need partition tolerance, so you choose between:

CP (Consistency + Partition tolerance): Bank transactions, inventory counts
AP (Availability + Partition tolerance): Social media feeds, analytics

Rate limiting

Protect your system from abuse and overload.

Algorithms:

Token bucket: Tokens refill at a fixed rate; each request costs one token
Sliding window: Count requests in a moving time window
Fixed window: Count requests per fixed interval (simpler, less accurate at boundaries)

Common system design problems

Problem	Key concepts
URL shortener	Hashing, base62 encoding, read-heavy caching
Chat system	WebSockets, message queues, presence service
News feed	Fan-out on write vs read, ranking, caching
Rate limiter	Token bucket, Redis counters, distributed coordination
File storage	Object storage (S3), CDN, chunked uploads, metadata DB
Search	Inverted index, Elasticsearch, ranking algorithms
Notification	Queue-based, multi-channel (push, email, SMS), preference service

Interview tips

Think out loud — The interviewer wants to see your thought process
Start simple, then optimize — Don’t jump to microservices immediately
Quantify everything — “A lot of traffic” is vague; “10K requests/sec” is specific
Name the trade-offs — Every decision has a downside; acknowledge it
Ask if they want you to go deeper — Don’t spend 10 minutes on something they don’t care about
Draw a diagram — Visual communication is clearer than verbal description