Migrating Vector Embeddings from PostgreSQL to Qdrant: Challenges, Learnings, and Insights
Migrating half a terabyte of data is no small feat, especially when the clock is ticking and disk space is running out. Over the past few days, I’ve been working on moving vector embeddings stored in PostgreSQL for serving https://infobud.news (via pgvectors extension) to a Qdrant node. The node, hosted on dedicated hardware with GPU support at Hetzner, will significantly improve our ability to manage and query vector-based data, as well as calculate new embeddings using certain open-source models. However, the process has been anything but straightforward.
This blog post is a breakdown of the challenges I encountered, the strategies I used to overcome them, and the lessons learned for future projects.
The Setup
The goal of this migration is to transfer embeddings and their associated metadata from PostgreSQL to Qdrant via API calls. Successful transfers trigger an automated deletion of the corresponding records in PostgreSQL to free up disk space.
But here’s the twist: our PostgreSQL instance is almost out of disk space and cannot be extended. This adds significant pressure, as the system must continue to handle live operations while the migration is underway.
The Challenges
1. Disk Space Management: Reclaiming Space with VACUUM
In PostgreSQL, deleted rows don’t immediately free up space; they leave “dead tuples” behind. To reclaim this space, I had to constantly run VACUUM on the affected tables and partitions during the migration.
The results were incremental: VACUUM would occasionally reclaim only a few megabytes of space. However, this was enough to keep the system operational for new incoming vectors.
To identify tables containing dead tuples, I use the following query:
SELECT relname AS table_name,
n_live_tup AS live_tuples,
n_dead_tup AS dead_tuples
FROM pg_stat_all_tables
WHERE schemaname = 'vectors';
2. Shared Memory Constraints
VACUUM operations can fail when PostgreSQL runs out of shared memory. When this happened, my only option was to run VACUUM without indexes, which bypasses the memory constraints but leaves indexes needing a rebuild.
VACUUM (INDEX_CLEANUP OFF, VERBOSE) vectors.embeddings_summary;
3. Rebuilding Indexes Without Downtime
Rebuilding indexes on a large table locks the table for the duration of the operation — something that can take hours with data of this magnitude.
To avoid blocking live operations, I used REBUILD CONCURRENTLY, which allows rebuilding indexes without table locks. The downside? It can only be done on a per-index basis, making the process slower but more feasible during live operations.
REINDEX INDEX CONCURRENTLY idx_summaries_title_gist_lower;
To identify all indexes of a table, I use the following query:
SELECT indexname FROM pg_indexes WHERE tablename = 'articles';
To identify the biggest indexes of a schema, I use this query:
SELECT
i.indexrelname as index_name,
pg_size_pretty(pg_relation_size(c.oid)) as index_size,
pg_get_indexdef(i.indexrelid) as create_statement
FROM pg_stat_all_indexes i
JOIN pg_class c ON i.indexrelid = c.oid
WHERE i.schemaname = 'vectors'
ORDER BY pg_relation_size(c.oid) DESC;
Partitioning: A Lifesaver with Room for Improvement
Partitioning has been essential for managing tables with millions and billions of records. Breaking down large tables into smaller partitions enabled more targeted VACUUM and index rebuild operations. However, the process highlighted areas where my partitioning strategy could be improved:
Vertical and Horizontal Partitioning
While my tables were vertically partitioned, I realized the importance of horizontal partitioning - splitting data by time intervals, such as by month or quarter. This would allow older data to be dropped entirely without affecting active partitions.
Separate Disk Spaces for Partitions
A major learning from this experience is the benefit of placing partitions on separate disk spaces. By isolating partitions on different storage volumes, I could easily drop an entire partition of older data and immediately recover disk space.
Lessons Learned
- Plan for Scalability from Day One
Partitioning strategies should consider both vertical and horizontal scaling. Time-based partitioning is especially useful for scenarios involving historical data. - Reclaiming Disk Space Is Painful Without Proper Design
Relying on VACUUM and index rebuilds for disk space management is not sustainable. Proactively designing for efficient data archival and deletion is critical. - Concurrency Is Key During Live Migrations
Tools like REBUILD CONCURRENTLY and partition-level operations are essential for minimizing downtime during large migrations. - Storage Flexibility Is a Must
Allocating partitions to different storage volumes offers the flexibility to reclaim space instantly by dropping older partitions.
Closing Thoughts
This migration has been a reminder of how critical it is to design systems with scalability, flexibility, and operational resilience in mind. While PostgreSQL has served us well, Qdrant offers the scalability and performance we need for managing vector embeddings at scale.
For anyone facing similar challenges, my advice is simple: invest time in designing your database schema and partitioning strategy upfront. It might seem like over-engineering early on, but it will save you countless hours (and headaches) down the road.
Have you faced similar challenges in your database migrations or large-scale data management projects? I’d love to hear about your experiences and strategies in the comments.