Snowflake

Cloud data warehouse with separated storage and compute. SQL-first, low-ops, and the BI team's favorite.

Category
Data Warehouses
Difficulty
Beginner
When to use
You want a no-ops analytics warehouse with strong governance, concurrent BI workloads, and simple SQL semantics.
When not to use
You need to train deep learning models on raw data, work with unstructured media, or ship streaming pipelines.
Alternatives
Databricks BigQuery Redshift DuckDB (for single-node)

At a glance

FieldValue
CategoryData Warehouse
DifficultyBeginner (SQL) → Intermediate (Snowpark / governance)
When to useAnalytics, BI, governed reporting, ELT output, embedded analytics
When not to useHeavy ML training, unstructured data, low-latency OLTP
AlternativesDatabricks, BigQuery, Redshift, DuckDB

Virtual warehouses

A “warehouse” in Snowflake is not the data — it’s the compute. Storage lives in S3/GCS/Azure Blob, managed by Snowflake. Compute is a stateless cluster you spin up on demand.

  • Sized T-shirt style: XS6XL. Each step doubles nodes and cost.
  • Auto-suspend when idle (default 10 min). Auto-resume on first query.
  • Independent warehouses can query the same data concurrently — the ETL job on a L warehouse doesn’t fight the BI dashboards on a XS.

This separation is the Snowflake superpower: you never pay for compute you don’t use, and you never queue behind another team.

Micro-partitions

Snowflake stores tables as immutable ~16 MB compressed columnar micro-partitions. It automatically tracks per-partition min/max on every column. That metadata powers pruning — most queries only scan a tiny fraction of the data.

You don’t create or manage these. You influence pruning by:

  • Clustering keys on very large tables that need predictable pruning.
  • Filtering on high-cardinality columns the optimizer already indexed.
  • Avoiding SELECT * on wide tables — columnar scan cost scales with columns read.

Check pruning with:

SELECT * FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_SESSION())
ORDER BY START_TIME DESC;

and look at PARTITIONS_SCANNED vs PARTITIONS_TOTAL.

Snowpark

Snowpark lets you run Python, Java, or Scala inside Snowflake warehouses. DataFrame API that looks like PySpark, with UDFs and stored procedures that execute on the warehouse compute. Handy for feature engineering when you don’t want to ship data out to a separate cluster.

Caveats: you’re constrained to what the Snowpark Python runtime ships (no arbitrary C extensions, limited package set), and heavy ML training is still better done elsewhere.

When Snowflake wins

  • Concurrent BI dashboards. 50 analysts hitting Tableau simultaneously? Spin up a dedicated warehouse for BI and the rest of your workloads are unaffected.
  • Governance and compliance. Row-access policies, masking policies, object tagging, and Horizon Catalog are mature. Auditors like Snowflake.
  • Data sharing. Share a live dataset with a partner across Snowflake accounts without copying it.
  • Zero-copy clones. CREATE TABLE staging CLONE prod; is instant. Great for dev environments off prod data.

When Snowflake loses

  • Heavy ML training. Snowpark ML exists, but for real DL training on GPUs, Databricks or a dedicated training stack wins.
  • Unstructured data. Images, audio, binary blobs: tolerable via directory tables, not pleasant.
  • Streaming. Snowpipe and dynamic tables are catching up but a Kafka → Spark → Delta pipeline still beats them for sub-minute latency.
  • Cost control under sloppy usage. SELECT * with no filters on a big table on a 2XL warehouse is an expensive mistake. Resource monitors and query tags are mandatory, not optional.

Cost model in one paragraph

You pay for storage (cheap, per TB-month) and compute (per-second billing on the running warehouse, charged in credits). Credits cost depends on edition (Standard → Business Critical) and cloud region. A warehouse at rest costs nothing; a warehouse at XL running continuously is the fastest way to burn budget. Set auto-suspend aggressively, use resource monitors, tag queries by team, and review top-N expensive queries weekly.

Related tools