Pattern Matching Algorithms
Master the art of finding patterns in text! From simple search to advanced algorithms like KMP and Boyer-Moore, learn how search engines, DNA analyzers, and text editors find what they're looking for.
Why Should You Care About Pattern Matching?
Pattern matching is everywhere in modern technology! Let's see where you encounter it daily:
Text Editors & IDEs
When you press Ctrl+F to find text, or when your IDE highlights syntax errors, that's pattern matching in action! VS Code uses Boyer-Moore for fast text search.
DNA Sequencing
Scientists use pattern matching to find specific gene sequences in DNA strands. A single human genome has 3 billion base pairs - imagine finding a specific sequence without efficient algorithms!
Search Engines
Google processes billions of searches daily. They use advanced pattern matching to find your query in trillions of web pages. Without efficient algorithms, searches would take forever!
Security & Antivirus
Antivirus software scans files for known virus signatures using pattern matching. They need to check millions of patterns quickly without slowing down your computer.
Plagiarism Detection
Tools like Turnitin use sophisticated pattern matching to compare submitted work against billions of documents to detect potential plagiarism.
The Basics - Let's Start Simple!
What is Pattern Matching?
Imagine you're reading a book and looking for every occurrence of the word "magic". Pattern matching is teaching a computer to do exactly that - find a specific pattern (like "magic") in a larger text (like your book).
Try It: Naive Pattern Matching
See how the simplest algorithm checks every position:
How It Works - Step by Step
Common Mistake: Off-by-One Error
Why it's wrong: If text="ABC" and pattern="BC", we need to check position 1, but range(3-2) only gives us position 0!
The Right Way
Common Pattern Matching Algorithms
KMP Algorithm - The Smart Skipper
KMP (Knuth-Morris-Pratt) is like having a smart bookmark that remembers partial matches!
Interactive KMP Visualizer
See how KMP builds its "failure function" to skip unnecessary comparisons:
Rabin-Karp - The Hash Detective
Instead of comparing characters, Rabin-Karp uses math! It calculates a "fingerprint" (hash) for the pattern and checks if any part of the text has the same fingerprint.
Rolling Hash Calculator
See how the hash "rolls" as we slide the window:
Boyer-Moore - The Backwards Genius
Boyer-Moore reads the pattern backwards! When it finds a mismatch, it can skip many positions at once.
Practice Problems
Problem 1: Count Pattern Occurrences
Write a function that counts how many times a pattern appears in a text (including overlapping occurrences).
Think About It First!
If text = "AAAA" and pattern = "AA", how many occurrences are there?
Hint: Don't skip ahead after finding a match!
Show Solution
Problem 2: Find All Anagrams
Find all starting positions in text where an anagram of the pattern exists.
Think About It First!
An anagram has the same characters but in any order. How can you check this efficiently?
Show Solution
Problem 3: Longest Repeating Substring
Find the longest substring that appears at least twice in the text.
Think About It First!
You'll need to check all possible substrings. How can you optimize this?
Show Solution
Advanced Algorithms
Z-Algorithm - The Prefix Master
The Z-algorithm creates an array where Z[i] tells you the length of the longest substring starting from position i that matches a prefix of the string.
Aho-Corasick - Multiple Pattern Search Champion
When you need to find multiple patterns at once (like virus signatures or banned words), Aho-Corasick builds a trie of all patterns and searches them simultaneously!
Quick Reference
Algorithm Comparison Chart
| Algorithm | Time Complexity | When to Use | Plain English Speed |
|---|---|---|---|
Naive |
O(n × m) | Short patterns, simple cases | 🐢 Slow but simple |
KMP |
O(n + m) | Need guaranteed linear time | ⚡ Fast and consistent |
Rabin-Karp |
O(n + m) average | Multiple pattern search | 🎲 Usually fast |
Boyer-Moore |
O(n/m) best case | Long patterns, large alphabet | 🚀 Super fast for long patterns |
Z-Algorithm |
O(n + m) | Need prefix matches | ⚡ Linear and elegant |
Aho-Corasick |
O(n + m + z) | Multiple patterns at once | 💫 Find everything in one pass |
Use Naive When...
- Pattern is very short (< 10 chars)
- Text is small
- Simplicity matters most
- One-time search
Use KMP When...
- Pattern has repetitive parts
- Need guaranteed O(n) time
- Streaming data
- Can't go backwards
Use Boyer-Moore When...
- Pattern is long
- Large alphabet (like English)
- Speed is critical
- Text is very long
Use Rabin-Karp When...
- Searching multiple patterns
- Pattern matching in 2D
- Plagiarism detection
- Rolling hash useful
Common Pitfalls to Avoid
Pitfall #1: Hash Collisions in Rabin-Karp
Always double-check matches character by character when hashes match!
Pitfall #2: Off-by-One in Loop Bounds
Remember: range(len(text) - len(pattern) + 1) not just - len(pattern)
Pitfall #3: Not Handling Empty Patterns
Always check if pattern is empty before searching!