How to deal with cyclic foreign keys
Sometimes cyclic foreign keys can’t be avoided, but modern SQL features make it quite easy to insert new data without a big hussle.
Setup
Let’s assume we have the following two tables:
create table department
(
id integer primary key,
name varchar(50) not null unique
manager_id integer not null
);
create table employee
(
id integer primary key,
name varchar(50) not null,
department_id integer not null
);
alter table department
add foreign key (manager_id) references employee;
alter table employee
add foreign key (department_id) references department;
As both foreign key columns are defined as NOT NULL
, the usual approach of first inserting a NULL
value first and update the row later with an existing value is not applicable.
Note that in reality, I would expect at least one of those columns to be nullable though.
Running two independent inserts won’t work as we can’t insert into the department
table without a manager_id
and we can’t insert into the employee
table without a department_id
.
With modern SQL, we do however have two approaches to insert departments and employees that reference each other.
Deferred constraints
The problem stems from the fact that the database evaluates the constraints at the end of the INSERT
statement. But when the first INSERT
is executed, the referenced values for the second table don’t exists (and vice versa).
A solution to this problem is a constraint that is checked when the transaction is committed, rather then after each INSERT
statement.
Precisely for this situation, Postgres (and Oracle) allow constraints to be defined as “deferrable” which enables just that: an evaluation of the constraint at COMMIT
time.
The constraint can be defined as “always deferred” or you can turn the “deferrable” behaviour on or off when you need it:
alter table department
add foreign key (manager_id) references employee
deferrable initially immediate;
alter table employee
add foreign key (department_id) references department
deferrable initially immediate;
After creating the constraints like that, they behave exactly like any other constraint: they are evaluated when running the DML statement (because of the INITIALLY IMMEDIATE
option).
But the deferrable
attribute allows us to defer the check until the end of the transaction. This is done using the SET CONSTRAINTS
command:
With that command it’s either possible to do this for a named constraint instead or for all constraints that allow to be deferred.
So with the following script, we can insert the new department and all employees without an error:
begin transaction;
set constraints all deferred;
insert into department
(id, name, manager_id)
values
(1, 'Maintenance', 42);
insert into employee
(id, name, department_id)
values
(1, 'Arthur', 1),
(2, 'Ford', 1),
(42, 'Marvin', 1);
commit;
The employee_id
42 doesn’t exist when the department is inserted. But the database does not complain, because the foreign key will only be checked when the COMMIT
is issued. After inserting the employees all foreign keys are correct and the COMMIT
will succeed.
Setting a constraint to deferrable (either as INITIALLY DEFERRED
or through a SET CONSTRAINTS
statement), incurs a memory overhead that might be substantial when processing many rows. If inserting depending rows is the exception rather then the common case, it is recommended to define the constraints as INITIALLY IMMEDIATE
and only make them deferred when really needed.
Data modifying common table expressions.
Postgres offers an even more elegant solution: it’s possible to combine the two INSERT
statements into a single statement using a common table expression.
Because it’s a single statement, there is no need to define the constraints as deferrable, as the constraints are checked at the end of the statement.
The above inserts can be rewritten to the following statement (without the need for deferrable constraints)
with new_deparment as (
insert into department
(id, name, manager_id)
values
(1, 'Maintenance', 42);
)
insert into employee
(id, name, department_id)
values
(1, 'Arthur', 1),
(2, 'Ford', 1),
(42, 'Marvin', 1);
The advantage of this is that it’s also possible even when it’s not possible to use explicit transactions (e.g. because the connection is set to auto-commit).
Conclusion
While it’s best to avoid cyclic foreign keys to begin with, it is rather easy to deal with them when needed with modern SQL.