Big-O Notation

How to reason about the cost of code without running it. The single most reused idea from DSA in day-to-day ML engineering.

Data Structures & Algorithms beginner #dsa #complexity #interview
Prereqs: Basic Python

What Big-O actually is

Big-O describes how the runtime (or memory) of an algorithm grows as the input size n grows. It’s an upper bound on the growth rate, ignoring constants and low-order terms. O(2n + 100) is O(n). O(n^2 + n) is O(n^2).

It does not tell you whether your code is fast. It tells you whether it will still be fast when n doubles. That’s a different and more important question when you’re running inference on a million users instead of ten.

The complexities you actually see

NotationNameReal-world example
O(1)ConstantDict lookup, array index, HashMap put
O(log n)LogarithmicBinary search, balanced tree insert
O(n)LinearOne pass over a list, streaming sum
O(n log n)LinearithmicComparison sort (sorted, merge sort, quicksort avg)
O(n^2)QuadraticNaive pairwise similarity, bubble sort, nested loops
O(n^3)CubicNaive matrix multiply on two n×n matrices
O(2^n)ExponentialBrute-force subset enumeration, naive recursion
O(n!)FactorialBrute-force permutations, the traveling salesman

Rule of thumb for a single-machine Python script with n = 10^6:

  • O(n) — milliseconds, fine.
  • O(n log n) — seconds, fine.
  • O(n^2) — hours, not fine.
  • O(2^n) — will not finish in your lifetime past n ≈ 30.

Reasoning about nested loops

The default heuristic: count nested loops over the input. One loop over n is O(n). A loop inside a loop is O(n*m). Be honest about what m is — if it’s also the full list, it’s O(n^2).

# O(n^2): for every pair of items
for i in items:
    for j in items:
        if similar(i, j):
            ...

# O(n): one pass, constant work per item
seen = set()
for i in items:
    if i in seen:
        ...
    seen.add(i)

A classic trick: if you see a nested loop, ask whether a hash set or dict could flatten it. Most “find pairs” problems collapse from O(n²) to O(n) that way.

Amortized analysis

Some operations are cheap on average even though a single call can be expensive. list.append is the canonical example: Python’s list doubles its backing array when it fills up, so most appends are O(1) but every 2^k-th append is O(n). Averaged across n appends, the total work is O(n), so each append is amortized O(1).

The same reasoning applies to dict resize, set resize, and many balanced-tree operations. When you see “amortized O(1)”, it means “O(1) on average, occasional slow calls, fine for throughput, maybe not fine for tail latency”.

Python builtins — the cheat table

OperationComplexity
list[i] readO(1)
list.append(x)O(1) amortized
list.insert(0, x)O(n)
x in list (unsorted)O(n)
sorted(list)O(n log n)
dict[k] get / set / deleteO(1) average
x in dict / x in setO(1) average
set.intersection(other)O(min(n, m))
deque.appendleft / popleftO(1)
heapq.heappush / heappopO(log n)
str + str (repeated in loop)O(n^2) — bad!
"".join(list_of_str)O(total len)

The trap: x in list looks innocent but is O(n). Converting to set once and then testing membership is the single most common “make Python fast” trick.

Identify-the-complexity drills

# 1
def f1(xs):
    total = 0
    for x in xs:
        total += x
    return total

O(n) — one pass.

# 2
def f2(xs):
    for x in xs:
        for y in xs:
            print(x, y)

O(n^2) — nested over the same list.

# 3
def f3(xs):
    xs = sorted(xs)
    for x in xs:
        bisect.bisect_left(xs, x)

O(n log n) — sort dominates; the loop is n * log n.

# 4
def f4(xs):
    seen = {}
    for x in xs:
        seen[x] = seen.get(x, 0) + 1
    return seen

O(n) — hash table ops are amortized O(1).

# 5
def f5(n):
    if n <= 1:
        return n
    return f5(n - 1) + f5(n - 2)

O(2^n) — naive Fibonacci, exponential because every call branches twice. With memoization this drops to O(n).

Why this matters in ML engineering

  • Embedding lookups must be O(1) or your RAG is slow. Use a dict or FAISS index, never a list scan.
  • Pairwise similarity is O(n²) naive — fine for 1k docs, impossible at 1M. That’s why vector databases exist.
  • Pandas apply on a column is O(n) but with huge constants — vectorize with NumPy or use Polars.
  • LLM context is (roughly) O(n²) in sequence length for attention, which is why context windows were hard-capped and why FlashAttention mattered.

If you only remember one thing: when n is large, constants stop mattering and shape matters. Always know the shape of your hot loop.