Big-O Notation
How to reason about the cost of code without running it. The single most reused idea from DSA in day-to-day ML engineering.
What Big-O actually is
Big-O describes how the runtime (or memory) of an algorithm grows as the
input size n grows. It’s an upper bound on the growth rate, ignoring
constants and low-order terms. O(2n + 100) is O(n). O(n^2 + n) is
O(n^2).
It does not tell you whether your code is fast. It tells you whether it
will still be fast when n doubles. That’s a different and more important
question when you’re running inference on a million users instead of ten.
The complexities you actually see
| Notation | Name | Real-world example |
|---|---|---|
O(1) | Constant | Dict lookup, array index, HashMap put |
O(log n) | Logarithmic | Binary search, balanced tree insert |
O(n) | Linear | One pass over a list, streaming sum |
O(n log n) | Linearithmic | Comparison sort (sorted, merge sort, quicksort avg) |
O(n^2) | Quadratic | Naive pairwise similarity, bubble sort, nested loops |
O(n^3) | Cubic | Naive matrix multiply on two n×n matrices |
O(2^n) | Exponential | Brute-force subset enumeration, naive recursion |
O(n!) | Factorial | Brute-force permutations, the traveling salesman |
Rule of thumb for a single-machine Python script with n = 10^6:
O(n)— milliseconds, fine.O(n log n)— seconds, fine.O(n^2)— hours, not fine.O(2^n)— will not finish in your lifetime pastn ≈ 30.
Reasoning about nested loops
The default heuristic: count nested loops over the input. One loop
over n is O(n). A loop inside a loop is O(n*m). Be honest about what
m is — if it’s also the full list, it’s O(n^2).
# O(n^2): for every pair of items
for i in items:
for j in items:
if similar(i, j):
...
# O(n): one pass, constant work per item
seen = set()
for i in items:
if i in seen:
...
seen.add(i)
A classic trick: if you see a nested loop, ask whether a hash set or dict could flatten it. Most “find pairs” problems collapse from O(n²) to O(n) that way.
Amortized analysis
Some operations are cheap on average even though a single call can be
expensive. list.append is the canonical example: Python’s list doubles
its backing array when it fills up, so most appends are O(1) but every
2^k-th append is O(n). Averaged across n appends, the total work is O(n),
so each append is amortized O(1).
The same reasoning applies to dict resize, set resize, and many
balanced-tree operations. When you see “amortized O(1)”, it means “O(1) on
average, occasional slow calls, fine for throughput, maybe not fine for
tail latency”.
Python builtins — the cheat table
| Operation | Complexity |
|---|---|
list[i] read | O(1) |
list.append(x) | O(1) amortized |
list.insert(0, x) | O(n) |
x in list (unsorted) | O(n) |
sorted(list) | O(n log n) |
dict[k] get / set / delete | O(1) average |
x in dict / x in set | O(1) average |
set.intersection(other) | O(min(n, m)) |
deque.appendleft / popleft | O(1) |
heapq.heappush / heappop | O(log n) |
str + str (repeated in loop) | O(n^2) — bad! |
"".join(list_of_str) | O(total len) |
The trap: x in list looks innocent but is O(n). Converting to set
once and then testing membership is the single most common “make Python
fast” trick.
Identify-the-complexity drills
# 1
def f1(xs):
total = 0
for x in xs:
total += x
return total
O(n) — one pass.
# 2
def f2(xs):
for x in xs:
for y in xs:
print(x, y)
O(n^2) — nested over the same list.
# 3
def f3(xs):
xs = sorted(xs)
for x in xs:
bisect.bisect_left(xs, x)
O(n log n) — sort dominates; the loop is n * log n.
# 4
def f4(xs):
seen = {}
for x in xs:
seen[x] = seen.get(x, 0) + 1
return seen
O(n) — hash table ops are amortized O(1).
# 5
def f5(n):
if n <= 1:
return n
return f5(n - 1) + f5(n - 2)
O(2^n) — naive Fibonacci, exponential because every call branches twice.
With memoization this drops to O(n).
Why this matters in ML engineering
- Embedding lookups must be O(1) or your RAG is slow. Use a dict or FAISS index, never a list scan.
- Pairwise similarity is O(n²) naive — fine for 1k docs, impossible at 1M. That’s why vector databases exist.
- Pandas
applyon a column is O(n) but with huge constants — vectorize with NumPy or use Polars. - LLM context is (roughly) O(n²) in sequence length for attention, which is why context windows were hard-capped and why FlashAttention mattered.
If you only remember one thing: when n is large, constants stop
mattering and shape matters. Always know the shape of your hot loop.