Design Problems

Hard 30 min read

System Design Made Simple

Build the data structures that power Instagram, YouTube, and Netflix! Learn LRU/LFU caches, autocomplete systems, and distributed architectures through interactive visualizations.

⏱️ 45-60 mins 📊 Advanced Level 🎮 5 Interactive Demos 💡 Real-world Systems

Why Should You Care About System Design?

These are the building blocks of every app you use daily! Let's see what's under the hood:

📱

Instagram's Feed Cache

Instagram uses LRU cache to store your feed. When you scroll, recently viewed posts stay in memory for instant access when you scroll back. Old posts get evicted to save memory!

📺

YouTube's Video Recommendations

YouTube uses LFU (Least Frequently Used) cache to keep popular videos ready for streaming. Viral videos stay cached while rarely watched ones are removed.

🔍

Google's Search Autocomplete

When you type in Google, it uses a Trie data structure to instantly suggest completions from billions of previous searches. That's why it's so fast!

🎬

Netflix's CDN Distribution

Netflix uses consistent hashing to distribute movies across servers worldwide. When a server is added or removed, only minimal content needs to be moved!

🐦

Twitter's Timeline Cache

Twitter pre-computes and caches timelines for active users. They use time-based eviction to keep feeds fresh while managing billions of tweets efficiently.

The Basics - LRU Cache Magic!

What is an LRU Cache?

Imagine a small bookshelf that can only hold 5 books. When you add a 6th book, you remove the one you haven't read in the longest time. That's LRU (Least Recently Used) cache!

Try It: Interactive LRU Cache

Add items to the cache and watch how it manages space:

Cache (capacity: 3):

0%
Hit Rate
0
Operations
Build Your First LRU Cache! 🚀
# LRU Cache - Like your browser history! class LRUCache: # Think of it as a line at a coffee shop # Most recent customer (used item) goes to the front # Person at the back gets kicked out when line is full! def __init__(self, capacity): self.capacity = capacity # How many items we can store self.cache = {} # Quick lookup dictionary self.order = [] # Track usage order def get(self, key): if key in self.cache: # Move to front (mark as recently used) self.order.remove(key) self.order.append(key) return self.cache[key] # Cache hit! 🎯 return -1 # Cache miss 😢 def put(self, key, value): if key in self.cache: # Update existing item self.order.remove(key) elif len(self.cache) >= self.capacity: # Evict least recently used (back of line) oldest = self.order.pop(0) del self.cache[oldest] print(f"👋 Evicted {oldest} to make room!") # Add new item to front self.cache[key] = value self.order.append(key) print(f"✅ Added {key}: {value}") # Example: Your recent YouTube videos cache = LRUCache(3) cache.put("video1", "Cat fails") cache.put("video2", "Cooking tips") cache.put("video3", "Music video") cache.get("video1") # Watched again! cache.put("video4", "Tutorial") # video2 gets evicted!

Common Mistake: Not Updating Access Order

# ❌ WRONG - Forgetting to update order on GET def bad_get(self, key): if key in self.cache: return self.cache[key] # Oops! Didn't mark as used! return -1

Why it's wrong: The item won't be marked as recently used and might get evicted even though you just accessed it!

The Right Way - Optimized with Doubly Linked List

# ✅ CORRECT - O(1) operations with HashMap + Doubly Linked List # HashMap gives instant lookup # Doubly Linked List maintains order efficiently # Head = Most Recent, Tail = Least Recent

Common Design Patterns

Intermediate

LFU Cache - The Popularity Contest!

Unlike LRU which tracks recency, LFU (Least Frequently Used) tracks popularity. Think of it like Spotify keeping your most-played songs cached!

LFU Cache - Track What's Hot! 🔥
# LFU - Like trending videos on YouTube! class SimpleLFUCache: def __init__(self, capacity): self.capacity = capacity self.cache = {} # key -> value self.frequencies = {} # key -> use count def get(self, key): if key in self.cache: # Increment frequency (it's getting popular!) self.frequencies[key] += 1 return self.cache[key] return -1 def put(self, key, value): if key in self.cache: self.cache[key] = value self.frequencies[key] += 1 else: if len(self.cache) >= self.capacity: # Find least frequently used min_freq = min(self.frequencies.values()) for k, freq in self.frequencies.items(): if freq == min_freq: del self.cache[k] del self.frequencies[k] break self.cache[key] = value self.frequencies[key] = 1 # Example: Popular songs cache lfu = SimpleLFUCache(2) lfu.put("song1", "Hit Single") lfu.put("song2", "New Release") lfu.get("song1") # Played again - frequency = 2 lfu.put("song3", "Viral Track") # song2 evicted (freq = 1)
Intermediate

Autocomplete System - Type Ahead Magic!

Ever wondered how Google knows what you're typing? It's using a Trie with frequency tracking!

Try It: Autocomplete Demo

Type and see suggestions appear:

Build Google's Autocomplete! 🔍
# Autocomplete - Like Google search suggestions! class AutocompleteSystem: def __init__(self): self.trie = {} # Tree of characters self.sentences = {} # sentence -> frequency self.current = "" # What user is typing def add_sentence(self, sentence, times=1): # Add to frequency tracker self.sentences[sentence] = self.sentences.get(sentence, 0) + times # Add to trie for fast prefix search node = self.trie for char in sentence: if char not in node: node[char] = {} node = node[char] node['#'] = sentence # Mark end of sentence def input(self, c): if c == '#': # User finished typing - add to history self.add_sentence(self.current) self.current = "" return [] self.current += c return self._get_suggestions(self.current) def _get_suggestions(self, prefix): # Navigate to prefix in trie node = self.trie for char in prefix: if char not in node: return [] node = node[char] # Find all sentences with this prefix suggestions = [] self._dfs(node, suggestions) # Sort by frequency (most popular first) suggestions.sort(key=lambda x: -self.sentences[x]) return suggestions[:3] # Top 3 # Example usage system = AutocompleteSystem() system.add_sentence("i love you", 5) system.add_sentence("i love coding", 2) print(system.input('i')) # ["i love you", "i love coding"] print(system.input(' ')) # ["i love you", "i love coding"] print(system.input('l')) # ["i love you", "i love coding"]

Practice Problems

Beginner

Problem 1: Time-Based Key-Value Store

Design a key-value store that can retrieve values at specific timestamps.

Think About It First!

It's like Git version history - you want to see what a file looked like at any point in time!

Show Solution
class TimeMap: def __init__(self): self.store = {} # key -> [(timestamp, value)] def set(self, key, value, timestamp): if key not in self.store: self.store[key] = [] self.store[key].append((timestamp, value)) def get(self, key, timestamp): if key not in self.store: return "" # Binary search for timestamp values = self.store[key] left, right = 0, len(values) - 1 result = "" while left <= right: mid = (left + right) // 2 if values[mid][0] <= timestamp: result = values[mid][1] left = mid + 1 else: right = mid - 1 return result # Test it! timemap = TimeMap() timemap.set("user", "Alice", 1) timemap.set("user", "Bob", 3) print(timemap.get("user", 2)) # "Alice" print(timemap.get("user", 4)) # "Bob"
Intermediate

Problem 2: Design Twitter

Design a simplified version of Twitter with follow, unfollow, and tweet features.

Think About It First!

You need to track users, their tweets, and who follows whom. How would you get a user's feed?

Show Solution
import heapq from collections import defaultdict class Twitter: def __init__(self): self.time = 0 self.tweets = defaultdict(list) # user -> [(time, tweetId)] self.followees = defaultdict(set) # user -> set of followees def postTweet(self, userId, tweetId): self.tweets[userId].append((self.time, tweetId)) self.time -= 1 # Negative for max heap def getNewsFeed(self, userId): # Get tweets from user and followees heap = [] # Add user's own tweets if userId in self.tweets: tweets = self.tweets[userId][-10:] # Last 10 for tweet in tweets: heap.append(tweet) # Add followees' tweets for followee in self.followees[userId]: if followee in self.tweets: tweets = self.tweets[followee][-10:] for tweet in tweets: heap.append(tweet) # Get 10 most recent heapq.heapify(heap) result = [] while heap and len(result) < 10: time, tweetId = heapq.heappop(heap) result.append(tweetId) return result def follow(self, followerId, followeeId): if followerId != followeeId: self.followees[followerId].add(followeeId) def unfollow(self, followerId, followeeId): self.followees[followerId].discard(followeeId)
Advanced

Problem 3: Rate Limiter

Design a rate limiter that allows only N requests per minute from each user.

Think About It First!

Think sliding window - you need to track recent requests and remove old ones!

Show Solution
from collections import deque import time class RateLimiter: def __init__(self, max_requests, window_seconds): self.max_requests = max_requests self.window = window_seconds self.requests = {} # userId -> deque of timestamps def is_allowed(self, userId): current_time = time.time() if userId not in self.requests: self.requests[userId] = deque() user_requests = self.requests[userId] # Remove old requests outside window while user_requests and user_requests[0] < current_time - self.window: user_requests.popleft() # Check if allowed if len(user_requests) < self.max_requests: user_requests.append(current_time) return True return False # Example: API rate limiting limiter = RateLimiter(max_requests=5, window_seconds=60) for i in range(7): allowed = limiter.is_allowed("user1") print(f"Request {i+1}: {'✅ Allowed' if allowed else '❌ Blocked'}")

Advanced Techniques

Advanced

Consistent Hashing - Distributed Cache Magic!

How does Netflix distribute movies across thousands of servers? Consistent hashing! It minimizes data movement when servers are added or removed.

Build Netflix's CDN! 🎬
# Consistent Hashing - Distribute like Netflix! import hashlib class ConsistentHash: def __init__(self, nodes=None, virtual_nodes=150): """ Imagine a clock circle (0-360 degrees) Servers and data are placed on this circle Data goes to the first server clockwise! """ self.virtual_nodes = virtual_nodes # More = better distribution self.ring = {} # position -> server self.sorted_keys = [] # sorted positions if nodes: for node in nodes: self.add_node(node) def _hash(self, key): # Convert to a position on the ring return int(hashlib.md5(key.encode()).hexdigest(), 16) def add_node(self, node): # Add multiple virtual nodes for better distribution for i in range(self.virtual_nodes): virtual_key = f"{node}:{i}" hash_val = self._hash(virtual_key) self.ring[hash_val] = node self.sorted_keys.append(hash_val) self.sorted_keys.sort() print(f"✅ Added server: {node}") def remove_node(self, node): # Remove all virtual nodes for i in range(self.virtual_nodes): virtual_key = f"{node}:{i}" hash_val = self._hash(virtual_key) if hash_val in self.ring: del self.ring[hash_val] self.sorted_keys.remove(hash_val) print(f"👋 Removed server: {node}") def get_node(self, key): # Find which server should handle this key if not self.ring: return None hash_val = self._hash(key) # Find first server clockwise for key_hash in self.sorted_keys: if hash_val <= key_hash: return self.ring[key_hash] # Wrap around to first server return self.ring[self.sorted_keys[0]] # Example: Distribute movies across servers cdn = ConsistentHash(["Server-US", "Server-EU", "Server-Asia"]) print(cdn.get_node("Avengers.mp4")) # Server-EU print(cdn.get_node("StarWars.mp4")) # Server-US # Add new server - minimal redistribution! cdn.add_node("Server-AU") print(cdn.get_node("Avengers.mp4")) # Might stay on Server-EU!
Advanced

Bloom Filter - Space-Efficient Set Membership

How does Chrome know if a URL is malicious without storing millions of bad URLs? Bloom filters!

Chrome's Malware Detection! 🛡️
# Bloom Filter - Probabilistic but super efficient! import hashlib class BloomFilter: def __init__(self, size=1000, hash_count=3): """ Like a bouncer with a fuzzy memory Definitely knows who's NOT on the list Might occasionally think someone is on the list when they're not """ self.size = size self.hash_count = hash_count self.bit_array = [0] * size def _hashes(self, item): # Generate multiple hash values results = [] for i in range(self.hash_count): hash_val = int(hashlib.md5( f"{item}{i}".encode() ).hexdigest(), 16) results.append(hash_val % self.size) return results def add(self, item): # Set bits at hash positions for hash_val in self._hashes(item): self.bit_array[hash_val] = 1 def might_contain(self, item): # Check if all bits are set for hash_val in self._hashes(item): if self.bit_array[hash_val] == 0: return False # Definitely not in set return True # Probably in set (small false positive chance) # Example: Malicious URL checker bloom = BloomFilter(size=10000, hash_count=5) # Add known bad URLs bad_urls = ["malware.com", "phishing.net", "virus.org"] for url in bad_urls: bloom.add(url) # Check URLs print(bloom.might_contain("malware.com")) # True (definitely bad) print(bloom.might_contain("google.com")) # False (definitely safe)

Quick Reference

Cache Comparison Chart

Cache Type Eviction Policy Use Case Example
LRU Least Recently Used Recent access patterns matter Instagram feed, Browser cache
LFU Least Frequently Used Popular items should stay YouTube trending, CDN
FIFO First In First Out Simple, age-based Message queues
TTL Time To Live Data has expiry DNS cache, Session tokens

System Design Patterns

Pattern Complexity When to Use Plain English
Consistent Hashing O(log n) Distributed systems ⚡ Minimal data movement
Bloom Filter O(k) Space-efficient membership 💾 Super memory efficient
Rate Limiter O(1) API throttling 🚦 Control request flow
Trie O(m) Prefix search 🔍 Fast autocomplete

Common Design Pitfalls

Pitfall #1: Memory Leaks in Cache

Always have a max size and eviction policy. Unbounded caches will crash your system!

Pitfall #2: Not Handling Concurrent Access

In production, use locks or concurrent data structures for thread safety!

Pitfall #3: Ignoring Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things."