Free Guide
System Design Interview Basics
Core concepts and patterns you need for system design interviews, explained simply.
5 min read
What interviewers are looking for
System design interviews test your ability to think through trade-offs, not memorize architectures. Interviewers evaluate:
- Requirements gathering — Can you ask the right clarifying questions?
- High-level design — Can you sketch a reasonable architecture?
- Deep dives — Can you reason about specific components in detail?
- Trade-offs — Can you articulate why you chose one approach over another?
The framework (use this every time)
Step 1: Clarify requirements (3-5 minutes)
Never start designing immediately. Ask:
- Functional: What does the system do? What are the core features?
- Scale: How many users? How many requests per second?
- Constraints: Latency requirements? Consistency requirements?
- Non-functional: Availability target? Data retention?
Example: "Design a URL shortener"
Clarifying questions:
- How many URLs shortened per day? → 100M
- How many redirects per day? → 10B (100:1 read/write ratio)
- Max URL length? → 2048 characters
- Custom aliases allowed? → Yes
- Analytics needed? → Click count, basic stats
- Expiration? → Optional, default 5 years
Step 2: Estimate scale (2-3 minutes)
Back-of-envelope math shows you understand the problem’s magnitude:
Writes: 100M / day = ~1200/sec
Reads: 10B / day = ~115,000/sec
Storage: 100M × 500 bytes (avg) = 50 GB/day, 18 TB/year
This tells you: read-heavy, needs caching, moderate write throughput.
Step 3: High-level design (5-10 minutes)
Sketch the major components:
Client → Load Balancer → API Servers → Cache (Redis)
→ Database
→ Analytics Queue → Analytics Service
Explain each component’s role and why it’s there.
Step 4: Deep dive (15-20 minutes)
Pick 2-3 components and go deep. The interviewer may direct you.
Core concepts
Load Balancing
Distributes traffic across multiple servers.
Algorithms:
- Round robin — Simple rotation (works when servers are identical)
- Least connections — Routes to the server with fewest active connections
- Weighted — Sends more traffic to more powerful servers
- IP hash — Same client always hits the same server (useful for sessions)
Layer 4 vs Layer 7:
- L4: Routes based on IP/port (faster, less intelligent)
- L7: Routes based on HTTP content — URL path, headers, cookies (more flexible)
Caching
Store frequently accessed data closer to the user.
Cache layers (from fastest to slowest):
- Browser cache (client)
- CDN (edge)
- Application cache — Redis/Memcached (server)
- Database query cache (database)
Cache strategies:
- Cache-aside (lazy): App checks cache → miss → reads DB → writes to cache
- Write-through: App writes to cache and DB simultaneously
- Write-behind: App writes to cache; cache async writes to DB (risky but fast)
Cache invalidation (the hard problem):
- TTL (Time to Live): Key expires after N seconds. Simple, eventually consistent.
- Write-through invalidation: Delete cache key on every write. Consistent but more traffic.
- Event-based: Publish cache invalidation events when data changes.
Database choices
| Need | Choose |
|---|---|
| Structured data, complex joins, ACID | PostgreSQL |
| Flexible schema, document-shaped data | MongoDB |
| Key-value with expiry, caching | Redis |
| Time-series data, metrics | InfluxDB / TimescaleDB |
| Search and full-text queries | Elasticsearch |
| Wide-column, massive scale | Cassandra / DynamoDB |
Scaling patterns
Vertical scaling (scale up): Bigger machine. Simple but has a ceiling.
Horizontal scaling (scale out): More machines. Complex but unlimited.
Database scaling:
- Read replicas: Route reads to copies, writes to primary
- Sharding: Split data across multiple databases by key (e.g., user_id % 4)
- Connection pooling: Reuse database connections instead of creating new ones
Message queues
Decouple services so they don’t need to be online simultaneously.
Producer → Queue → Consumer
Use cases:
- Email sending (don't block the API response)
- Image processing (async, can be slow)
- Analytics events (fire and forget)
- Order processing (reliable, retry on failure)
RabbitMQ — Traditional message broker, smart routing, guaranteed delivery Kafka — Event streaming, high throughput, message replay, log-based
Consistency vs Availability (CAP theorem)
You can’t have all three during a network partition:
- Consistency: Every read returns the latest write
- Availability: Every request gets a response
- Partition tolerance: System works despite network failures
In practice, you always need partition tolerance, so you choose between:
- CP (Consistency + Partition tolerance): Bank transactions, inventory counts
- AP (Availability + Partition tolerance): Social media feeds, analytics
Rate limiting
Protect your system from abuse and overload.
Algorithms:
- Token bucket: Tokens refill at a fixed rate; each request costs one token
- Sliding window: Count requests in a moving time window
- Fixed window: Count requests per fixed interval (simpler, less accurate at boundaries)
Common system design problems
| Problem | Key concepts |
|---|---|
| URL shortener | Hashing, base62 encoding, read-heavy caching |
| Chat system | WebSockets, message queues, presence service |
| News feed | Fan-out on write vs read, ranking, caching |
| Rate limiter | Token bucket, Redis counters, distributed coordination |
| File storage | Object storage (S3), CDN, chunked uploads, metadata DB |
| Search | Inverted index, Elasticsearch, ranking algorithms |
| Notification | Queue-based, multi-channel (push, email, SMS), preference service |
Interview tips
- Think out loud — The interviewer wants to see your thought process
- Start simple, then optimize — Don’t jump to microservices immediately
- Quantify everything — “A lot of traffic” is vague; “10K requests/sec” is specific
- Name the trade-offs — Every decision has a downside; acknowledge it
- Ask if they want you to go deeper — Don’t spend 10 minutes on something they don’t care about
- Draw a diagram — Visual communication is clearer than verbal description