Big-O Notation

How to reason about the cost of code without running it. The single most reused idea from DSA in day-to-day ML engineering.

Data Structures & Algorithms beginner #dsa #complexity #interview

Prereqs: Basic Python

What Big-O actually is

Big-O describes how the runtime (or memory) of an algorithm grows as the input size n grows. It’s an upper bound on the growth rate, ignoring constants and low-order terms. O(2n + 100) is O(n). O(n^2 + n) is O(n^2).

It does not tell you whether your code is fast. It tells you whether it will still be fast when n doubles. That’s a different and more important question when you’re running inference on a million users instead of ten.

The complexities you actually see

Notation	Name	Real-world example
`O(1)`	Constant	Dict lookup, array index, HashMap put
`O(log n)`	Logarithmic	Binary search, balanced tree insert
`O(n)`	Linear	One pass over a list, streaming sum
`O(n log n)`	Linearithmic	Comparison sort (`sorted`, merge sort, quicksort avg)
`O(n^2)`	Quadratic	Naive pairwise similarity, bubble sort, nested loops
`O(n^3)`	Cubic	Naive matrix multiply on two n×n matrices
`O(2^n)`	Exponential	Brute-force subset enumeration, naive recursion
`O(n!)`	Factorial	Brute-force permutations, the traveling salesman

Rule of thumb for a single-machine Python script with n = 10^6:

O(n) — milliseconds, fine.
O(n log n) — seconds, fine.
O(n^2) — hours, not fine.
O(2^n) — will not finish in your lifetime past n ≈ 30.

Reasoning about nested loops

The default heuristic: count nested loops over the input. One loop over n is O(n). A loop inside a loop is O(n*m). Be honest about what m is — if it’s also the full list, it’s O(n^2).

# O(n^2): for every pair of items
for i in items:
    for j in items:
        if similar(i, j):
            ...

# O(n): one pass, constant work per item
seen = set()
for i in items:
    if i in seen:
        ...
    seen.add(i)

A classic trick: if you see a nested loop, ask whether a hash set or dict could flatten it. Most “find pairs” problems collapse from O(n²) to O(n) that way.

Amortized analysis

Some operations are cheap on average even though a single call can be expensive. list.append is the canonical example: Python’s list doubles its backing array when it fills up, so most appends are O(1) but every 2^k-th append is O(n). Averaged across n appends, the total work is O(n), so each append is amortized O(1).

The same reasoning applies to dict resize, set resize, and many balanced-tree operations. When you see “amortized O(1)”, it means “O(1) on average, occasional slow calls, fine for throughput, maybe not fine for tail latency”.

Python builtins — the cheat table

Operation	Complexity
`list[i]` read	`O(1)`
`list.append(x)`	`O(1)` amortized
`list.insert(0, x)`	`O(n)`
`x in list` (unsorted)	`O(n)`
`sorted(list)`	`O(n log n)`
`dict[k]` get / set / delete	`O(1)` average
`x in dict` / `x in set`	`O(1)` average
`set.intersection(other)`	`O(min(n, m))`
`deque.appendleft` / `popleft`	`O(1)`
`heapq.heappush` / `heappop`	`O(log n)`
`str + str` (repeated in loop)	`O(n^2)` — bad!
`"".join(list_of_str)`	`O(total len)`

The trap: x in list looks innocent but is O(n). Converting to set once and then testing membership is the single most common “make Python fast” trick.

Identify-the-complexity drills

# 1
def f1(xs):
    total = 0
    for x in xs:
        total += x
    return total

O(n) — one pass.

# 2
def f2(xs):
    for x in xs:
        for y in xs:
            print(x, y)

O(n^2) — nested over the same list.

# 3
def f3(xs):
    xs = sorted(xs)
    for x in xs:
        bisect.bisect_left(xs, x)

O(n log n) — sort dominates; the loop is n * log n.

# 4
def f4(xs):
    seen = {}
    for x in xs:
        seen[x] = seen.get(x, 0) + 1
    return seen

O(n) — hash table ops are amortized O(1).

# 5
def f5(n):
    if n <= 1:
        return n
    return f5(n - 1) + f5(n - 2)

O(2^n) — naive Fibonacci, exponential because every call branches twice. With memoization this drops to O(n).

Why this matters in ML engineering

Embedding lookups must be O(1) or your RAG is slow. Use a dict or FAISS index, never a list scan.
Pairwise similarity is O(n²) naive — fine for 1k docs, impossible at 1M. That’s why vector databases exist.
Pandas apply on a column is O(n) but with huge constants — vectorize with NumPy or use Polars.
LLM context is (roughly) O(n²) in sequence length for attention, which is why context windows were hard-capped and why FlashAttention mattered.

If you only remember one thing: when n is large, constants stop mattering and shape matters. Always know the shape of your hot loop.