Understanding the Performance Difference in Adding Columns in PostgreSQL

Hagen Hübel
4 min readMay 20, 2024

--

Performance optimisation is crucial in databases, especially when dealing with large tables. Many database administrators and developers encounter a common scenario involving adding new columns to existing tables. Recently, I faced an interesting situation with PostgreSQL that revealed some insights into how the database handles column additions and updates. Here’s what I discovered and why it’s important.

The Question

Imagine you have a big table consisting of tens of thousands of records. You want to add a new nullable column without a default value and then execute an UPDATE statement to set the value of this new column to a specific value. This process takes a considerable amount of time. However, when you instead add a new column with a default value, it doesn't take as much time. Why is there such a difference in performance?

The Scenario

Let’s break down the scenario step by step to understand the underlying mechanisms and performance implications:

  1. Adding a New Nullable Column Without a Default Value
  2. Updating the New Column
  3. Adding a New Column with a Default Value

1. Adding a New Nullable Column Without a Default Value

When you add a new nullable column that doesn’t have a default value, PostgreSQL performs the operation very quickly. Here’s why:

  • Metadata Update: PostgreSQL only needs to update the table’s metadata to include the new column. It doesn’t need to modify every row in the table physically.
  • Implicit NULL: The new column will implicitly have a NULL value for all existing rows. PostgreSQL doesn't need to store these NULL values explicitly in each row.

Here’s an example SQL statement:

ALTER TABLE big_table ADD COLUMN new_column VARCHAR;

This operation is fast because it is essentially a change in the table’s schema rather than its data.

2. Updating the New Column

After adding the new column, if you execute an UPDATE statement to set the value of the new column, PostgreSQL must visit each row and modify it. This operation can be time-consuming, especially for large tables:

UPDATE big_table SET new_column = 'specific_value';
  • Row Updates: PostgreSQL must update each row in the table to set the new value. This means reading each row, modifying it, and writing it back to disk.
  • Resource Intensive: The process is resource-intensive, involving considerable I/O operations and possibly locking issues, which can slow down other operations on the table.

3. Adding a New Column with a Default Value

In contrast, adding a new column with a default value is optimised in PostgreSQL, especially starting from version 11. Here’s what happens:

  • Metadata Update: Similar to adding a nullable column, PostgreSQL updates the table’s metadata to include the new column.
  • Default Value Optimization: Instead of immediately updating every row to store the default value, PostgreSQL records the default value at the metadata level. This means that when the column is read for an existing row, PostgreSQL returns the default value without actually having to write it to each row.

Here’s the SQL statement for adding a column with a default value:

ALTER TABLE big_table ADD COLUMN new_column VARCHAR DEFAULT 'default_value';

This operation is efficient because it defers the actual row updates. The default value is physically stored only when the row is next updated or a new row is inserted.

Why the Difference in Performance?

The performance difference boils down to how PostgreSQL handles these operations internally:

  • Nullable Column Without Default: Quick to add because it only involves a metadata change.
  • Updating Column Values is slow because it requires touching every row, reading, modifying, and writing it back.
  • A column with Default Value: Optimized because it leverages a metadata trick to defer physical row updates.

Practical Implications

Understanding this difference is crucial for database management and performance tuning. Here are some practical tips:

  • Avoid Immediate Bulk Updates: If you need to set a specific value for a new column, consider the performance impact of bulk updates on large tables.
  • Leverage Default Values: When adding new columns, use default values to take advantage of PostgreSQL’s optimisation.
  • Plan Maintenance Windows: Schedule operations that might involve extensive updates during maintenance windows to minimise user impact.

Conclusion

The way PostgreSQL handles adding columns with and without default values showcases the importance of understanding database internals for performance optimisation. By leveraging PostgreSQL’s optimisations, you can ensure efficient schema changes, even on large tables. Next time you need to modify your table schema, remember these insights to make informed decisions and keep your database running smoothly.

--

--

Hagen Hübel
Hagen Hübel

No responses yet