Weaviate

Open-source vector database with built-in hybrid search, modules for embedding generation, and strong filtering.

Category
Vector Databases
Difficulty
Intermediate
When to use
You want an open-source, self-hostable vector store with hybrid (vector + keyword) search out of the box.
When not to use
You need the simplest possible setup and Chroma or pgvector would do the job.
Alternatives
Pinecone Qdrant Milvus pgvector

At a glance

FieldValue
CategoryOpen-source vector database
DifficultyIntermediate
When to useSelf-hosted RAG with hybrid search
When not to useMinimal setups where simpler stores work
AlternativesPinecone, Qdrant, Milvus, pgvector

What it is

Weaviate stores objects with properties and (optionally) vector embeddings, and supports BM25, vector, and hybrid search that combines them. Its module system can generate embeddings at ingestion time via OpenAI, Cohere, HuggingFace, or local models, so you don’t have to run a separate embedding step. GraphQL and REST APIs, strong filtering, and multi-tenancy are included.

When we reach for it at Ephizen

  • Self-hosted deployments where data can’t leave our infrastructure.
  • Workloads where hybrid search outperforms pure vector search (legal, docs, code).
  • Multi-tenant systems that need isolated namespaces.

Getting started

import weaviate
client = weaviate.connect_to_local()
docs = client.collections.get("Docs")
docs.data.insert({"content": "Refund policy...", "tags": ["faq"]})
res = docs.query.hybrid(query="how do refunds work?", limit=5)

Gotchas

  • The v3 → v4 Python client was a breaking change. Make sure docs and code match the version you installed.
  • Module-based embedding ties your index to a specific provider; bring-your-own-vectors is cleaner for migrations.
  • HNSW indexes are in-memory — size the node accordingly.

Related tools