Design Problems | LIZIU DSA

Why Should You Care About System Design?

These are the building blocks of every app you use daily! Let's see what's under the hood:

📱

Instagram's Feed Cache

Instagram uses LRU cache to store your feed. When you scroll, recently viewed posts stay in memory for instant access when you scroll back. Old posts get evicted to save memory!

📺

YouTube's Video Recommendations

YouTube uses LFU (Least Frequently Used) cache to keep popular videos ready for streaming. Viral videos stay cached while rarely watched ones are removed.

🔍

Google's Search Autocomplete

When you type in Google, it uses a Trie data structure to instantly suggest completions from billions of previous searches. That's why it's so fast!

🎬

Netflix's CDN Distribution

Netflix uses consistent hashing to distribute movies across servers worldwide. When a server is added or removed, only minimal content needs to be moved!

🐦

Twitter's Timeline Cache

Twitter pre-computes and caches timelines for active users. They use time-based eviction to keep feeds fresh while managing billions of tweets efficiently.

The Basics - LRU Cache Magic!

What is an LRU Cache?

Imagine a small bookshelf that can only hold 5 books. When you add a 6th book, you remove the one you haven't read in the longest time. That's LRU (Least Recently Used) cache!

Try It: Interactive LRU Cache

Add items to the cache and watch how it manages space:

Cache (capacity: 3):

0%

Hit Rate

0

Operations

Build Your First LRU Cache! 🚀

# LRU Cache - Like your browser history!
class LRUCache:
    # Think of it as a line at a coffee shop
    # Most recent customer (used item) goes to the front
    # Person at the back gets kicked out when line is full!
    
    def __init__(self, capacity):
        self.capacity = capacity        # How many items we can store
        self.cache = {}                 # Quick lookup dictionary
        self.order = []                 # Track usage order
    
    def get(self, key):
        if key in self.cache:
            # Move to front (mark as recently used)
            self.order.remove(key)
            self.order.append(key)
            return self.cache[key]  # Cache hit! 🎯
        return -1                 # Cache miss 😢
    
    def put(self, key, value):
        if key in self.cache:
            # Update existing item
            self.order.remove(key)
        elif len(self.cache) >= self.capacity:
            # Evict least recently used (back of line)
            oldest = self.order.pop(0)
            del self.cache[oldest]
            print(f"👋 Evicted {oldest} to make room!")
        
        # Add new item to front
        self.cache[key] = value
        self.order.append(key)
        print(f"✅ Added {key}: {value}")

# Example: Your recent YouTube videos
cache = LRUCache(3)
cache.put("video1", "Cat fails")
cache.put("video2", "Cooking tips")
cache.put("video3", "Music video")
cache.get("video1")  # Watched again!
cache.put("video4", "Tutorial")  # video2 gets evicted!

Common Mistake: Not Updating Access Order

# ❌ WRONG - Forgetting to update order on GET
def bad_get(self, key):
    if key in self.cache:
        return self.cache[key]  # Oops! Didn't mark as used!
    return -1

Why it's wrong: The item won't be marked as recently used and might get evicted even though you just accessed it!

The Right Way - Optimized with Doubly Linked List

# ✅ CORRECT - O(1) operations with HashMap + Doubly Linked List
# HashMap gives instant lookup
# Doubly Linked List maintains order efficiently
# Head = Most Recent, Tail = Least Recent

Common Design Patterns

Intermediate

LFU Cache - The Popularity Contest!

Unlike LRU which tracks recency, LFU (Least Frequently Used) tracks popularity. Think of it like Spotify keeping your most-played songs cached!

LFU Cache - Track What's Hot! 🔥

# LFU - Like trending videos on YouTube!
class SimpleLFUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}          # key -> value
        self.frequencies = {}    # key -> use count
    
    def get(self, key):
        if key in self.cache:
            # Increment frequency (it's getting popular!)
            self.frequencies[key] += 1
            return self.cache[key]
        return -1
    
    def put(self, key, value):
        if key in self.cache:
            self.cache[key] = value
            self.frequencies[key] += 1
        else:
            if len(self.cache) >= self.capacity:
                # Find least frequently used
                min_freq = min(self.frequencies.values())
                for k, freq in self.frequencies.items():
                    if freq == min_freq:
                        del self.cache[k]
                        del self.frequencies[k]
                        break
            
            self.cache[key] = value
            self.frequencies[key] = 1

# Example: Popular songs cache
lfu = SimpleLFUCache(2)
lfu.put("song1", "Hit Single")
lfu.put("song2", "New Release")
lfu.get("song1")  # Played again - frequency = 2
lfu.put("song3", "Viral Track")  # song2 evicted (freq = 1)

Intermediate

Autocomplete System - Type Ahead Magic!

Ever wondered how Google knows what you're typing? It's using a Trie with frequency tracking!

Try It: Autocomplete Demo

Type and see suggestions appear:

Build Google's Autocomplete! 🔍

# Autocomplete - Like Google search suggestions!
class AutocompleteSystem:
    def __init__(self):
        self.trie = {}           # Tree of characters
        self.sentences = {}      # sentence -> frequency
        self.current = ""       # What user is typing
    
    def add_sentence(self, sentence, times=1):
        # Add to frequency tracker
        self.sentences[sentence] = self.sentences.get(sentence, 0) + times
        
        # Add to trie for fast prefix search
        node = self.trie
        for char in sentence:
            if char not in node:
                node[char] = {}
            node = node[char]
        node['#'] = sentence  # Mark end of sentence
    
    def input(self, c):
        if c == '#':
            # User finished typing - add to history
            self.add_sentence(self.current)
            self.current = ""
            return []
        
        self.current += c
        return self._get_suggestions(self.current)
    
    def _get_suggestions(self, prefix):
        # Navigate to prefix in trie
        node = self.trie
        for char in prefix:
            if char not in node:
                return []
            node = node[char]
        
        # Find all sentences with this prefix
        suggestions = []
        self._dfs(node, suggestions)
        
        # Sort by frequency (most popular first)
        suggestions.sort(key=lambda x: -self.sentences[x])
        return suggestions[:3]  # Top 3

# Example usage
system = AutocompleteSystem()
system.add_sentence("i love you", 5)
system.add_sentence("i love coding", 2)
print(system.input('i'))   # ["i love you", "i love coding"]
print(system.input(' '))   # ["i love you", "i love coding"]
print(system.input('l'))   # ["i love you", "i love coding"]

Practice Problems

Beginner

Problem 1: Time-Based Key-Value Store

Design a key-value store that can retrieve values at specific timestamps.

Think About It First!

It's like Git version history - you want to see what a file looked like at any point in time!

Show Solution

class TimeMap:
    def __init__(self):
        self.store = {}  # key -> [(timestamp, value)]
    
    def set(self, key, value, timestamp):
        if key not in self.store:
            self.store[key] = []
        self.store[key].append((timestamp, value))
    
    def get(self, key, timestamp):
        if key not in self.store:
            return ""
        
        # Binary search for timestamp
        values = self.store[key]
        left, right = 0, len(values) - 1
        result = ""
        
        while left <= right:
            mid = (left + right) // 2
            if values[mid][0] <= timestamp:
                result = values[mid][1]
                left = mid + 1
            else:
                right = mid - 1
        
        return result

# Test it!
timemap = TimeMap()
timemap.set("user", "Alice", 1)
timemap.set("user", "Bob", 3)
print(timemap.get("user", 2))  # "Alice"
print(timemap.get("user", 4))  # "Bob"

Intermediate

Problem 2: Design Twitter

Design a simplified version of Twitter with follow, unfollow, and tweet features.

Think About It First!

You need to track users, their tweets, and who follows whom. How would you get a user's feed?

Show Solution

import heapq
from collections import defaultdict

class Twitter:
    def __init__(self):
        self.time = 0
        self.tweets = defaultdict(list)  # user -> [(time, tweetId)]
        self.followees = defaultdict(set)  # user -> set of followees
    
    def postTweet(self, userId, tweetId):
        self.tweets[userId].append((self.time, tweetId))
        self.time -= 1  # Negative for max heap
    
    def getNewsFeed(self, userId):
        # Get tweets from user and followees
        heap = []
        
        # Add user's own tweets
        if userId in self.tweets:
            tweets = self.tweets[userId][-10:]  # Last 10
            for tweet in tweets:
                heap.append(tweet)
        
        # Add followees' tweets
        for followee in self.followees[userId]:
            if followee in self.tweets:
                tweets = self.tweets[followee][-10:]
                for tweet in tweets:
                    heap.append(tweet)
        
        # Get 10 most recent
        heapq.heapify(heap)
        result = []
        while heap and len(result) < 10:
            time, tweetId = heapq.heappop(heap)
            result.append(tweetId)
        
        return result
    
    def follow(self, followerId, followeeId):
        if followerId != followeeId:
            self.followees[followerId].add(followeeId)
    
    def unfollow(self, followerId, followeeId):
        self.followees[followerId].discard(followeeId)

Advanced

Problem 3: Rate Limiter

Design a rate limiter that allows only N requests per minute from each user.

Think About It First!

Think sliding window - you need to track recent requests and remove old ones!

Show Solution

from collections import deque
import time

class RateLimiter:
    def __init__(self, max_requests, window_seconds):
        self.max_requests = max_requests
        self.window = window_seconds
        self.requests = {}  # userId -> deque of timestamps
    
    def is_allowed(self, userId):
        current_time = time.time()
        
        if userId not in self.requests:
            self.requests[userId] = deque()
        
        user_requests = self.requests[userId]
        
        # Remove old requests outside window
        while user_requests and user_requests[0] < current_time - self.window:
            user_requests.popleft()
        
        # Check if allowed
        if len(user_requests) < self.max_requests:
            user_requests.append(current_time)
            return True
        
        return False

# Example: API rate limiting
limiter = RateLimiter(max_requests=5, window_seconds=60)
for i in range(7):
    allowed = limiter.is_allowed("user1")
    print(f"Request {i+1}: {'✅ Allowed' if allowed else '❌ Blocked'}")

Advanced Techniques

Advanced

Consistent Hashing - Distributed Cache Magic!

How does Netflix distribute movies across thousands of servers? Consistent hashing! It minimizes data movement when servers are added or removed.

Build Netflix's CDN! 🎬

# Consistent Hashing - Distribute like Netflix!
import hashlib

class ConsistentHash:
    def __init__(self, nodes=None, virtual_nodes=150):
        """
        Imagine a clock circle (0-360 degrees)
        Servers and data are placed on this circle
        Data goes to the first server clockwise!
        """
        self.virtual_nodes = virtual_nodes  # More = better distribution
        self.ring = {}                      # position -> server
        self.sorted_keys = []               # sorted positions
        
        if nodes:
            for node in nodes:
                self.add_node(node)
    
    def _hash(self, key):
        # Convert to a position on the ring
        return int(hashlib.md5(key.encode()).hexdigest(), 16)
    
    def add_node(self, node):
        # Add multiple virtual nodes for better distribution
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:{i}"
            hash_val = self._hash(virtual_key)
            self.ring[hash_val] = node
            self.sorted_keys.append(hash_val)
        self.sorted_keys.sort()
        print(f"✅ Added server: {node}")
    
    def remove_node(self, node):
        # Remove all virtual nodes
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:{i}"
            hash_val = self._hash(virtual_key)
            if hash_val in self.ring:
                del self.ring[hash_val]
                self.sorted_keys.remove(hash_val)
        print(f"👋 Removed server: {node}")
    
    def get_node(self, key):
        # Find which server should handle this key
        if not self.ring:
            return None
        
        hash_val = self._hash(key)
        
        # Find first server clockwise
        for key_hash in self.sorted_keys:
            if hash_val <= key_hash:
                return self.ring[key_hash]
        
        # Wrap around to first server
        return self.ring[self.sorted_keys[0]]

# Example: Distribute movies across servers
cdn = ConsistentHash(["Server-US", "Server-EU", "Server-Asia"])
print(cdn.get_node("Avengers.mp4"))   # Server-EU
print(cdn.get_node("StarWars.mp4"))   # Server-US

# Add new server - minimal redistribution!
cdn.add_node("Server-AU")
print(cdn.get_node("Avengers.mp4"))   # Might stay on Server-EU!

Advanced

Bloom Filter - Space-Efficient Set Membership

How does Chrome know if a URL is malicious without storing millions of bad URLs? Bloom filters!

Chrome's Malware Detection! 🛡️

# Bloom Filter - Probabilistic but super efficient!
import hashlib

class BloomFilter:
    def __init__(self, size=1000, hash_count=3):
        """
        Like a bouncer with a fuzzy memory
        Definitely knows who's NOT on the list
        Might occasionally think someone is on the list when they're not
        """
        self.size = size
        self.hash_count = hash_count
        self.bit_array = [0] * size
    
    def _hashes(self, item):
        # Generate multiple hash values
        results = []
        for i in range(self.hash_count):
            hash_val = int(hashlib.md5(
                f"{item}{i}".encode()
            ).hexdigest(), 16)
            results.append(hash_val % self.size)
        return results
    
    def add(self, item):
        # Set bits at hash positions
        for hash_val in self._hashes(item):
            self.bit_array[hash_val] = 1
    
    def might_contain(self, item):
        # Check if all bits are set
        for hash_val in self._hashes(item):
            if self.bit_array[hash_val] == 0:
                return False  # Definitely not in set
        return True  # Probably in set (small false positive chance)

# Example: Malicious URL checker
bloom = BloomFilter(size=10000, hash_count=5)

# Add known bad URLs
bad_urls = ["malware.com", "phishing.net", "virus.org"]
for url in bad_urls:
    bloom.add(url)

# Check URLs
print(bloom.might_contain("malware.com"))  # True (definitely bad)
print(bloom.might_contain("google.com"))   # False (definitely safe)

Quick Reference

Cache Comparison Chart

Cache Type	Eviction Policy	Use Case	Example
`LRU`	Least Recently Used	Recent access patterns matter	Instagram feed, Browser cache
`LFU`	Least Frequently Used	Popular items should stay	YouTube trending, CDN
`FIFO`	First In First Out	Simple, age-based	Message queues
`TTL`	Time To Live	Data has expiry	DNS cache, Session tokens

System Design Patterns

Pattern	Complexity	When to Use	Plain English
`Consistent Hashing`	O(log n)	Distributed systems	⚡ Minimal data movement
`Bloom Filter`	O(k)	Space-efficient membership	💾 Super memory efficient
`Rate Limiter`	O(1)	API throttling	🚦 Control request flow
`Trie`	O(m)	Prefix search	🔍 Fast autocomplete

Common Design Pitfalls

Pitfall #1: Memory Leaks in Cache

Always have a max size and eviction policy. Unbounded caches will crash your system!

Pitfall #2: Not Handling Concurrent Access

In production, use locks or concurrent data structures for thread safety!

Pitfall #3: Ignoring Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things."

System Design Made Simple

Why Should You Care About System Design?

Instagram's Feed Cache

YouTube's Video Recommendations

Google's Search Autocomplete

Netflix's CDN Distribution

Twitter's Timeline Cache

The Basics - LRU Cache Magic!

What is an LRU Cache?

Try It: Interactive LRU Cache

Common Mistake: Not Updating Access Order

The Right Way - Optimized with Doubly Linked List

Common Design Patterns

LFU Cache - The Popularity Contest!

Autocomplete System - Type Ahead Magic!

Try It: Autocomplete Demo

Practice Problems

Problem 1: Time-Based Key-Value Store

Think About It First!

Problem 2: Design Twitter

Think About It First!

Problem 3: Rate Limiter

Think About It First!

Advanced Techniques

Consistent Hashing - Distributed Cache Magic!

Bloom Filter - Space-Efficient Set Membership

Quick Reference

Cache Comparison Chart

System Design Patterns

Common Design Pitfalls

Pitfall #1: Memory Leaks in Cache

Pitfall #2: Not Handling Concurrent Access

Pitfall #3: Ignoring Cache Invalidation