Gapless number generation
One aspect of database sequences which a lot of people stumble over, is that they can have gaps. Either because a value obtained with nextval
wasn’t used, or because rows were deleted.
For a surrogate key (“generated key”) this is not a problem, because those values only have to be unique and gaps are meaningless and can safely be ignored.
In some situations gapless number migh however be a (legal) requirement, e.g. for invoice numbers.
Very often such number are generated by locking the whole table for the duration of the INSERT (which would also prevent UPDATEs).
The (exclusive) locking of the table is not necessary however. The generation of the numbers can be serialized throug a row level lock instead.
Setup
For this, we need a table that stores the current value of our number generator. We only need a single table to support multiple generators.
create table number_generator
(
entity varchar(30) not null primary key,
value integer default 1 not null
);
For each table that needs a gapless number, we will create one row in the table number_generator
.
To initialize (“create”) a generator we insert a row into the table:
insert into number_generator (entity) values ('invoice_numbers');
To get the next number for a generator, we can use a function:
create or replace function next_value(p_entity varchar)
returns integer
as
$$
update number_generator
set value = value + 1
where entity = lower(p_entity)
returning value;
$$
language sql
volatile;
The function can also be changed to not require a row in the table:
create or replace function next_value(p_entity varchar)
returns integer
as
$$
insert into number_generator (entity)
values (lower(p_entity))
on conflict (entity)
do update set value = number_generator.value + 1
returning value;
$$
language sql
volatile;
Usage
When defining a table, we can use that function as the default value for our invoice number:
create table invoice
(
invoice_number integer not null primary key default next_value('invoice_numbers'),
due_date date not null
... other columns ...
);
When we want to insert a new invoice, we let the database generate the number:
insert into invoice (invoice_number, due_date, ...)
values (default, current_date, ...)
returning invoice_number;
The function next_value()
acquires an exclusive lock for the row in the number_generator
table which is released when the transaction requesting the number is committed or rolled back.
If two transactions try to insert a new invoice, the second one waits on the row-lock of the number_generator
table.
If the first transaction commits, the function will “see” the committed value and increment that value correctly. If the first transaction rolls back, the second one ses the “old” value of the value and increments that.
Drawbacks
Of course this reduces the INSERT throughput because only one INSERT at a time can finish without waiting. This is the price that has to be paid for getting a gapless sequence. But it’s still more efficient than the alternative, which would be to lock the whole table before any insert.
If the table lock was made in exclusive mode, this would even block SELECTs from the table in Postgres.
This approach does not re-use numbers from deleted rows. But typically when gapless numbers are required, DELETEs are not allowed anyway.
Extend to context-aware numbers
Very often numbers need to have different “ranges”, e.g. each customer needs to have their own numbers starting with 1
The above concept can be extended to do that, by using one row for each customer in the number_generator
table. As this requires passing a dynamic value to next_value(), the expression can no longer be used as a default value (Postgres won’t allow default values based on other columns).
The function needs to be called during insert:
insert into invoice
(invoice_number, customer_id, due_date)
values
(next_value('customer_42_'), 42, date '2020-05-17');