With the increasing popularity of Postgres, I see more and more questions requiring help to migrate code from other database systems such as Oracle or SQL Server to Postgres.
What I very often see is, that the migration tries to simply translate the syntax from one system to another. This isn’t limited to migrations to Postgres. But very often migrations from Oracle to SQL Server or from SQL Server to Oracle, fall into the same trap.
Simply migrating e.g. a LOOP from one e.g. T-SQL to PL/pgSQL might technically work, but is very often a mediocre solution when features of the target database are not used to its full extent.
Some simple examples I stumbled upon are:
Using stored procedures to retrieve data
Especially in the SQL Server community it seems that stored procedures are the preferred way to encapsulate the logic needed to retrieve data. SQL Server’s T-SQL makes this very easy as SQL Server can dynamically determine the result sets returned by the procedure. However this is not the case for Postgres (and to a certain extent for Oracle as well).
In computer science a procedure - by definition - has not “output” or “result”. So from a conceptional point of view a procedure returning a result is something odd for someone how is not used to the “SQL Server way” of doing things. Additionally stored procedures don’t allow the caller to join the result to other tables or add additional
ORDER BY clauses.
For Postgres the natural choice for encapsulating the logic to retrieve something is a function. Postgres makes creating so called “set returning functions” very easy. As I have shown in the post about Performance of stored functions the language used also makes a difference.
So instead of fighting with Postgres’ limited abilities to return result sets from a stored procedure (or a “void” function in versions prior to 11) using a set returning function is the natural choice.
Consider the following (simplified) T-SQL function:
create procedure get_data(@zip_code varchar(50)) as declare @result table ( customer_id integer, customer_name varchar(20), zip_code varchar(5) ) begin insert into @result select c.id, c.customer_name, c.zip_code from customers c where c.zip_code like @zip_code + '%'; select * from @result; end;
When trying to migrate this approach 1:1 to Postgres (or Oracle for that matter) this will get complicated and very often I hear people complaining that “Postgres is too complicated”.
However with the right “Postgres mindset” (rather than the “SQL Server mindset”) the migration is as simple as:
create function get_data(p_zip_code varchar(50)) returns table ( customer_id integer, customer_name varchar(20), zip_code varchar(5) ) as $$ select c.id, c.customer_name, c.zip_code from customers c where c.zip_code like p_zip_code || '%'; $$ language sql;
Using that function or procedure is different, however. Where I could write
in SQL Server, I now have to write:
select * from get_data('1234');
Ignoring the capabilities of the target system
Neither Oracle nor SQL Server support native arrays in SQL (Oracle has collections, but they can only be used in PL/SQL).
I regularly stumble upon systems where stored functions returns their results as comma separated strings. Very often those strings are then used inside dynamic SQL to built SQL
IN expressions (I usually challenge the basic approach of passing lists of IDs around, rather than properly joining (partial) results - but that is a a different topic).
Passing comma separated strings around is error prone and typically less efficient than using proper arrays in Postgres.
I have seen functions like this:
create function get_ids(@dept_id integer) returns varchar(max) as begin declare @result_list varchar(max); declare @id integer; declare dept_cursor CURSOR FOR SELECT employee_id FROM employees WHERE department_id = @dept_id; OPEN dept_cursor FETCH NEXT FROM dept_cursor INTO @id set @result_list = cast(@id as varchar(20)); WHILE @@FETCH_STATUS = 0 BEGIN SET @result_list = @result_list + ',' + cast(@id as varchar(20)); FETCH NEXT FROM dept_cursor INTO @id; END; CLOSE dept_cursor; DEALLOCATE dept_cursor; return @result_list; end;
The use of a cursor is very questionable to begin with, but I assume this was created long before SQL Server supported
But very often developers with the “SQL Server mindset” aren’t aware that Postgres offers arrays and a handy aggregation function to collect values into an array and they try to migrate the above function using a cursor and string concatenation as well.
However, using the “Postgres mindset”, the correct migrated function would be:
create function get_ids(p_dept_id integer) returns int as $$ select array_agg(employee_id) from employees where department_id = p_dept_id; $$ language sql;
Using an array typically results in about 30% less memory usage at run time compared to a comma separated string of numbers.
Using the wrong data types
number vs. numeric
Postgres has a very rich set of data types supporting many standard and non-standard types that can make life a lot easier when used properly.
A common error I see when migrating from Oracle to Postgres is to use
numeric(22) as a replacement to Oracle’s
While this works, it’s not an efficient solution. Oracle’s
number data type is a variable length data type (similar to
varchar) where the storage size depends on the actual value. Bigger numbers need more storage. In Postgres
numeric is a fixed length data type that also includes decimal digits and its a lot slower and uses more space than e.g. a
The correct data type in Postgres for a
number(22) in Oracle is therefor
bit vs. boolean
The fact that Postgres (unlike most other relational databases) fully supports the boolean data type. In SQL Server very often bit is used for columns that act as “true/false” flags. Postgres does have a bit type as well, but using that for a “flag”, makes queries more complicated than they should be.
With e.g. a boolean column named
is_active can be used like this:
select * from some_table where not is_active;
With a bit column (e.g.
bit(1)) the above would have to be written as:
select * from some_table where is_active = B'0';
But more importantly: the boolean type uses less storage space. A single boolean column uses 1 byte, a
bit(1) column requires 6 bytes of storage because the bit data type has an overhead of 5 to 8 bytes .
Both, Postgres and SQL Server have specialized data types for storing UUIDs. However Oracle does not have such a data type and thus to store a UUID in Oracle, one must the RAW data type if efficiency and storage size is important. Using a VARCHAR column avoids the use of the cumbersome RAW type but requires much more storage space in the table and any index.
But I have seen migrations that insisted on doing a 1:1 migration and created the UUID columns as
bytea in Postgres. (or something similar in SQL Server) This will have many negative side effects. Byte arrays are usually very ugly to look at or specify literal values for. But more importantly, one loses the ability to efficiently index them.
When migrating to a new database product, it’s vital to get accustomed with the possibilities of the target system.
Blindly applying patterns and design choices that worked in the old system might lead to an implementation that doesn’t exploit the full potential of the new system. Very often the new database is blamed for being “slow” or “complicated” while that isn’t actually the case - it’s just that it works differently and requires a new mindset.