How to deal with cyclic foreign keys

2020-05-04

Sometimes cyclic foreign keys can’t be avoided, but modern SQL features make it quite easy to insert new data without a big hussle.

Setup

Let’s assume we have the following two tables:

create table department
(
   id         integer primary key, 
   name       varchar(50) not null unique
   manager_id integer not null
);

create table employee
(
  id            integer primary key,
  name          varchar(50) not null,
  department_id integer not null
);

alter table department
  add foreign key (manager_id) references employee;
  
alter table employee
  add foreign key (department_id) references department;

As both foreign key columns are defined as NOT NULL, the usual approach of first inserting a NULL value first and update the row later with an existing value is not applicable.

Note that in reality, I would expect at least one of those columns to be nullable though.

Running two independent inserts won’t work as we can’t insert into the department table without a manager_id and we can’t insert into the employee table without a department_id.

With modern SQL, we do however have two approaches to insert departments and employees that reference each other.

Deferred constraints

The problem stems from the fact that the database evaluates the constraints at the end of the INSERT statement. But when the first INSERT is executed, the referenced values for the second table don’t exists (and vice versa).

A solution to this problem is a constraint that is checked when the transaction is committed, rather then after each INSERT statement.

Precisely for this situation, Postgres (and Oracle) allow constraints to be defined as “deferrable” which enables just that: an evaluation of the constraint at COMMIT time.

The constraint can be defined as “always deferred” or you can turn the “deferrable” behaviour on or off when you need it:

alter table department
  add foreign key (manager_id) references employee
  deferrable initially immediate;
  
alter table employee
  add foreign key (department_id) references department
  deferrable initially immediate;

After creating the constraints like that, they behave exactly like any other constraint: they are evaluated when running the DML statement (because of the INITIALLY IMMEDIATE option).

But the deferrable attribute allows us to defer the check until the end of the transaction. This is done using the SET CONSTRAINTS command:

With that command it’s either possible to do this for a named constraint instead or for all constraints that allow to be deferred.

So with the following script, we can insert the new department and all employees without an error:

begin transaction;
set constraints all deferred;

insert into department 
  (id, name, manager_id) 
values 
  (1, 'Maintenance', 42);

insert into employee 
  (id, name, department_id) 
values
  (1, 'Arthur', 1), 
  (2, 'Ford', 1),
  (42, 'Marvin', 1);
  
commit;

The employee_id 42 doesn’t exist when the department is inserted. But the database does not complain, because the foreign key will only be checked when the COMMIT is issued. After inserting the employees all foreign keys are correct and the COMMIT will succeed.

Setting a constraint to deferrable (either as INITIALLY DEFERRED or through a SET CONSTRAINTS statement), incurs a memory overhead that might be substantial when processing many rows. If inserting depending rows is the exception rather then the common case, it is recommended to define the constraints as INITIALLY IMMEDIATE and only make them deferred when really needed.

Data modifying common table expressions.

Postgres offers an even more elegant solution: it’s possible to combine the two INSERT statements into a single statement using a common table expression.

Because it’s a single statement, there is no need to define the constraints as deferrable, as the constraints are checked at the end of the statement.

The above inserts can be rewritten to the following statement (without the need for deferrable constraints)

with new_deparment as (
  insert into department 
    (id, name, manager_id) 
  values 
    (1, 'Maintenance', 42);
)
insert into employee 
  (id, name, department_id) 
values
  (1, 'Arthur', 1), 
  (2, 'Ford', 1),
  (42, 'Marvin', 1);

The advantage of this is that it’s also possible even when it’s not possible to use explicit transactions (e.g. because the connection is set to auto-commit).

Conclusion

While it’s best to avoid cyclic foreign keys to begin with, it is rather easy to deal with them when needed with modern SQL.