PIVOT Alternatives

2019-03-22

One type of problems is inherently complicated with SQL: turning rows into columns. This is very often referred to as a “crosstab” or “pivot”.

The main reason why a crosstab is complicated in SQL is, that the number of columns that a query returns must be known before the execution of the query starts. Additionally all rows have to have the same number of columns in SQL.

Consider the following setup, where products can be assigned to multiple categories:

create table product  (
   prod_id   integer not null primary key,
   name      varchar(100) not null,
   price     numeric
);

create table category (
   cat_id integer not null primary key, 
   name   varchar(100) not null unique,
   discount numeric
);
      
create table product_category (
   prod_id integer not null references product,
   cat_id integer not null references category, 
   is_active boolean not null default true,
   primary key (prod_id, cat_id)
);

The sample data can be downloaded here.

Now let’s assume we would like to list every product (one row for each product) and the categories it belongs to as columns. Oracle and SQL Server support the PIVOT operator:

select *
from (
  select p.name as product,
         c.name as category, 
         c.cat_id
  from product p
    left join product_category pc on pc.prod_id = p.prod_id       
    left join category c on c.cat_id = pc.cat_id
)    
pivot (min(category) AS cat FOR (cat_id) IN (1, 2, 3, 4, 5, 6))
order by product;

In Postgres (and other DBMS products) one would need to resort to conditional aggregation:

select p.name as product, 
       max(c.name) filter (where c.name = 'Electronics') as category_1,
       max(c.name) filter (where c.name = 'TV Sets') as category_2,
       max(c.name) filter (where c.name = 'Tools') as category_3,
       max(c.name) filter (where c.name = 'Outdoor Equipment') as category_4,
       max(c.name) filter (where c.name = 'Displays') as category_5,
       max(c.name) filter (where c.name = 'Laptops') as category_6
from product p
  left join product_category pc on pc.prod_id = p.prod_id       
  left join category c on c.cat_id = pc.cat_id
group by p.name
order by p.name;

Based on the sample data the output would look like this:

product          | category_1  | category_2 | category_3 | category_4        | category_5 | category_6
-----------------+-------------+------------+------------+-------------------+------------+-----------
132cm TV Set     | Electronics | TV Sets    |            |                   | Displays   |           
152cm TV Set     | Electronics | TV Sets    |            |                   | Displays   |           
55cm Monitor     | Electronics |            |            |                   | Displays   |           
70cm Monitor     | Electronics |            |            |                   | Displays   |           
High End Laptop  | Electronics |            |            |                   |            | Laptops   
Screwdriver      |             |            | Tools      |                   |            |           
Standard Laptop  | Electronics |            |            |                   |            | Laptops   
Swiss Army Knife |             |            | Tools      | Outdoor Equipment |            |

It is pretty obvious that this gets really cumbersome if a new category is added and it also doesn’t scale very well.

Postgres also offers a crosstab function to do this. But just like conditional aggregation, it suffers from the same problem: the query needs to be changed each time a new category is added.

The above output is very usefull if the data should be imported into e.g. a spreadsheet program like LibreOffice Calc.

But if that data is intended to be displayed e.g. on a web page, it is a lot easier to return the categories as a JSON value that can directly be used by the frontend.

The easiest way is to return the categories as a single JSON array:

select p.name as product, 
       jsonb_agg(c.name order by c.name) as categories
from product p
  left join product_category pc on pc.prod_id = p.prod_id       
  left join category c on c.cat_id = pc.cat_id
group by p.name
order by p.name;

Without seeing the output, it is obvious that this query doesn’t need to be changed when more categories are added. The output for the above (Postgres) query would be:

product          | categories                            
-----------------+---------------------------------------
132cm TV Set     | ["Displays", "Electronics", "TV Sets"]
152cm TV Set     | ["Displays", "Electronics", "TV Sets"]
55cm Monitor     | ["Displays", "Electronics"]           
70cm Monitor     | ["Displays", "Electronics"]           
High End Laptop  | ["Electronics", "Laptops"]            
Screwdriver      | ["Tools"]                             
Standard Laptop  | ["Electronics", "Laptops"]            
Swiss Army Knife | ["Outdoor Equipment", "Tools"]

Why JSON? We could also easily use string_agg() to get a comma separated list of categories.

JSON has some advantages: it’s a well defined format and we don’t need to think about what characters to escape. The value can directly be used by a HTML/JavaScript frontend. But much more importantly: it enables us to easily extend what we return.

The product_category table contains a flag if the assignment to the category is currently active. Adding that column to the “traditional” PIVOT or filtered aggregation query would massively blow up the code. When we return this information as JSON this is very easy:

select p.name as product, 
       jsonb_agg(jsonb_build_object(c.name, jsonb_build_object('active', pc.is_active))) as categories
from product p
  left join product_category pc on pc.prod_id = p.prod_id       
  left join category c on c.cat_id = pc.cat_id
group by p.name
order by p.name;

This gives us:

product          | categories                                                                                         
-----------------+----------------------------------------------------------------------------------------------------
132cm TV Set     | [{"Displays": {"active": false}}, {"TV Sets": {"active": true}}, {"Electronics": {"active": true}}]
152cm TV Set     | [{"Displays": {"active": false}}, {"TV Sets": {"active": true}}, {"Electronics": {"active": true}}]
55cm Monitor     | [{"Displays": {"active": true}}, {"Electronics": {"active": true}}]                                
70cm Monitor     | [{"Displays": {"active": true}}, {"Electronics": {"active": true}}]                                
High End Laptop  | [{"Laptops": {"active": true}}, {"Electronics": {"active": true}}]                                 
Screwdriver      | [{"Tools": {"active": true}}]                                                                      
Standard Laptop  | [{"Laptops": {"active": true}}, {"Electronics": {"active": true}}]                                 
Swiss Army Knife | [{"Tools": {"active": false}}, {"Outdoor Equipment": {"active": true}}]

Returning the PIVOT result as a JSON value is much more flexible and can easier be changed if the requirements change. All modern DBMS support JSON and generating JSON content in one way or the other, so the above solution does not only apply to Postgres.

The drawback of this solution is however, that the result can’t easily be used in a spreadsheet program.