- Migrating your SQL Server Workloads to PostgreSQL – Part 1
- Migrating your SQL Server Workloads to PostgreSQL – Part 2 (This article)
- Migrating your SQL Server Workloads to PostgreSQL – Part 3
- Migrating your SQL Server Workloads to PostgreSQL – Part 4
SQL Server vs PostgreSQL
For a high-level comparison, check this DB Engines page here.
Concurrency control of row data is an important aspect of database systems that determines how efficiently row data is maintained and updated. Different database systems have different concurrency control mechanisms that have a direct impact on their performance.
SQL Server Concurrency Control
SQL Server versions prior to 2005 traditionally used Row level exclusive locking that locks the entire row during write operations. This prevents other read and write threads from accessing the row and they had to wait till the current write operation on the row completed (either committed or rolled back). This has a significant impact on performance of the database during heavy concurrent writes and reads.
SQL Server 2005 improved this scenario by introducing “row-versioning isolation levels”. See Locking and Row Versioning. Note that there are two separate MVCC implementations, read committed isolation using row versioning (RCSI) and snapshot isolation (SI)
SQL Server queries can return dirty (uncommitted) rows depending on the isolation level of the transactions.
PostgreSQL Concurrency Control
PostgreSQL maintains data consistency using a multiversion model (Multiversion Concurrency Control, MVCC), which means that each SQL statement sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This prevents statements from viewing inconsistent data produced by concurrent transactions performing updates on the same data rows, providing transaction isolation for each database session. MVCC avoids locking rows which minimizes lock contention and improves overall performance.
The main advantage of using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading. PostgreSQL maintains this guarantee even when providing the strictest level of transaction isolation through the use of an Serializable Snapshot Isolation (SSI) level.
PostgreSQL queries never return dirty rows, no matter what the isolation level of the transaction is.
The Database Structure
Like SQL Server, PostgreSQL can contain multiple Databases within a single instance, and each database can contain multiple Schemas. Each of these schemas can contains other database objects like Tables, Views, Stored Procedures, Functions, etc.
When a connection to a database is opened in PostgreSQL, you can only refer to that database within that connection. If your queries on the connection refer to a different database running on the same instance, PostgreSQL will throw an error. You either open a connection for each database separately or use dblink or Foreign Data Wrappers to query tables from multiple databases.
T-SQL vs PL/pgSQL
The SQL Server procedural extension to the standard SQL language is called Transact-SQL or T-SQL in short and provides additional capabilities like procedural programming, looping constructs, conditional constructs, session management, Stored Procedures, etc.
PostgreSQL multiple procedural languages and the default language is called PL/pgSQL.
PostgreSQL has a more refined SQL syntax
Once you start using PostgreSQL, you’ll notice that it has a much cleaner and refined language syntax compared to SQL Server. Compare these statements:
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[customers]') AND type IN (N'U')) BEGIN CREATE TABLE [dbo].[customers]( .... ) END
CREATE TABLE IF NOT EXISTS customers ( .... )
PostgreSQL code is much simpler, cleaner and refined. SQL Server code is definitely a maintenance nightmare.
Few more examples:
-- This throws an error if the table doesn’t exist DROP TABLE [dbo].[customers] -- This is how you do it in SQL Server 2014 and older IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[customers]') AND type IN (N'U')) BEGIN DROP TABLE [dbo].[customers] END -- This works in SQL Server 2016 and newer DROP TABLE IF EXISTS [dbo].[customers]
DROP TABLE IF EXISTS customers; -- No error if the table doesn’t exist
Similarly, PostgreSQL supports
CREATE OR REPLACE syntax for Stored Procedures, Functions, Views etc. But in SQL Server 2014 and prior, you would have to check if the object exists and then drop it before attempting to create it again. You can use the
ALTER statement, but it throws an error if the object doesn’t already exist. SQL Server 2016 SP1 introduced the
CREATE OR ALTER syntax for Stored Procedures, Functions, Triggers and Views.
SQL Server 2016 and above support
DROP IF EXISTS for Tables/Views but still does not support the
CREATE IF NOT EXISTS syntax.
Case sensitivity of object names
In SQL Server; object names Customers, CUSTOMERS and customers are all the same. SQL Server does not impose case-sensitivity in object names and creates objects with the same case specified in the
In PostgreSQL, object names are silently converted to lowercase. When a name is enclosed in double quotation marks, the name becomes case sensitive and must be used with quotation marks in queries. Therefore, the above names are converted to customers, but “Customers” (with quotes) is treated as a different object as must the used with quotes in queries that reference the object. This rule holds good for column names in tables as well.
Migrating database objects from SQL Server to PostgreSQL is pretty straightforward as most of these objects are supported as-is in the target platform.
|Materialized Views||Yes||Yes, starting v9.3|
|Computed Column||Yes||Yes, available in v12 Use Views for earlier versions||Computed Columns are called Generated Columns in PostgreSQL|
|Stored Procedures||Yes||Yes, starting v11||Prior to PostgreSQL 11, developers used Functions instead of Stored Procedures|
|User Defined Functions||Yes||Yes|
|Overloaded Functions||No||Yes||In PostgreSQL, you can have multiple functions with the same name but different parameters.|
|Common Table Expressions (CTE)||Yes||Yes||PostgreSQL 11 and older: CTEs are always materialized and may not perform well. PostgreSQL 12: Query hints allow CTEs to be either materialized or non-materialized.|
|Max Table Size||Unlimited||32 Terabytes (32TB) in PostgreSQL 9.6 or earlier 2 Exabytes (2EB) in PostgreSQL 10|
|Max Columns per table||1024 (non-wide) 30000 (wide)||250 – 1600 depending on column types|
|Max Database size||524,272 Terabytes||Unlimited|
|Max Varchar Length||Varchar(Max)||Varchar(10485760) Use text for longer strings|
|Scheduler||SQL Server Agent||pgAgent|
Irrespective of whether you use tools or use manual method, keep the following data type mappings between SQL Server and PostgreSQL data types while doing the migration. Some tools allow you to play around the data type mappings to fine tune the migration process.
|Microsoft SQL Server||Description||PostgreSQL|
|CHAR(n)||Fixed length char string, 1 <= n <= 8000||CHAR(n)|
|VARCHAR(n)||Variable length char string, 1 <= n <= 8000||VARCHAR(n)|
|VARCHAR(max)||Variable length char string, <= 2GB||TEXT|
|NVARCHAR(n)||Variable length Unicode UCS-2 string||VARCHAR(n)|
|NVARCHAR(max)||Variable length Unicode UCS-2 data, <= 2GB||TEXT|
|TEXT||Variable length character data, <= 2GB||TEXT|
|NTEXT||Variable length Unicode UCS-2 data, <= 2GB||TEXT|
|UNIQUEIDENTIFIER||16 byte GUID(UUID) data||CHAR(16)|
|INTEGER||32 bit integer||INTEGER|
|TINYINT||8 bit unsigned integer, 0 to 255||SMALLINT|
|DOUBLE PRECISION||Double precision floating point number||DOUBLE PRECISION|
|FLOAT(p)||Floating point number||DOUBLE PRECISION|
|NUMERIC(p,s)||Fixed point number||NUMERIC(p,s)|
|SMALLMONEY||32 bit currency amount||MONEY|
|DATE||Date includes year, month and day||DATE|
|DATETIME||Date and Time with fraction||TIMESTAMP(3)|
|DATETIME2(p)||Date and Time with fraction||TIMESTAMP(n)|
|DATETIMEOFFSET(p)||Date and Time with fraction and time zone||TIMESTAMP(p) WITH TIME ZONE|
|SMALLDATETIME||Date and Time||TIMESTAMP(0)|
|BIT||1, 0 or NULL||BOOLEAN|
|BINARY(n)||Fixed length byte string||BYTEA|
|VARBINARY(n)||Variable length byte string, 1 <= n <= 8000||BYTEA|
|VARBINARY(max)||Variable length byte string, <= 2GB||BYTEA|
|ROWVERSION||Automatically updated binary data||BYTEA|
|IMAGE||Variable length binary data, <= 2GB||BYTEA|
This list contains only commonly used data types, PostgreSQL supports a vast variety of data types and also supports custom data types. For a comprehensive list of major data types supported by PostgreSQL, check out this post.
Built-In Functions & operators
PostgreSQL has an extensively rich set of operators and built-in functions, way beyond what is offered by SQL Server.
|Microsoft SQL Server||PostgreSQL||Comments|
For a comprehensive list of all PostgreSQL functions and operators, check out this page.
SQL Language differences
|Feature||Microsoft SQL Server||PostgreSQL|
|Select first N rows||TOP n||LIMIT n|
|Statement terminator||; (not required)||; (required)|
|LIKE operator||Case insensitive by default (determined by collation)||Case sensitive, use ILIKE for case insensitive comparison|
|Regular Expressions||LIKE operator||SIMILAR TO operator|
CTE Performance Differences
In SQL Server, this query:
WITH AllUsers AS (SELECT * FROM Users) SELECT * FROM AllUsers WHERE Id = 100;
results in a query plan for the entire query at once, and the WHERE clause filter is passed into the CTE. The resulting query plan is efficient, doing just a single clustered index seek.
In PostgreSQL, CTEs are optimization fences (outer query restrictions are not passed on to CTEs) and the database evaluates the query inside the CTE and caches the results (i.e., materialized results) and outer WHERE clauses are applied later when the outer query is processed, which means either a full table scan or a full index seek is performed and results in horrible performance for large tables. To overcome this, you rewrite this query in PostgreSQL as:
WITH UserRecord AS (SELECT * FROM Users WHERE Id = 100) SELECT * FROM UserRecord;
The other option is to rewrite the query using a subquery. Keep that in mind when migrating code and queries that involve CTEs.
Note that PostgreSQL 12 addresses this problem by introducing query optimizer hints to enable us to control if the CTE should be materialized or not: MATERIALIZED, NOT MATERIALIZED. So, the query can be rewritten as follows to achieve better performance.
WITH AllUsers AS NOT MATERIALIZED (SELECT * FROM Users) SELECT * FROM AllUsers WHERE Id = 100;
Collation / Ordering
There are few fundamental differences in the way SQL Server and PostgreSQL store and compare data. In SQL Server, the default collation is Case insensitive but in PostgreSQL, it is case sensitive. Therefore, your code/query that searches for text in
WHERE clause or
ON clause in joins or
LIKE statement might fail. PostgreSQL provides an
ILIKE statement for case insensitive comparison.
When Delete does not delete
In SQL Server, when a DELETE statement is issued against a table, the rows are permanently deleted from the storage. In PostgreSQL they’re just marked for deletion (soft delete) and not immediately removed from storage. The UPDATE statement behaves similarly, the old row is marked for deletion and a new row is inserted with the new row data.
Keep that in mind when migrating tables that undergo heavy deletes and updates regularly, because they’ll keep growing exponentially until you do a VACUUM. You will have to set up a job that Vacuums the tables in your databases at regular intervals which could de daily, weekly or monthly depending on the growth rate of each table.
The AutoVacuum daemon is turned on by default and ensures that Vacuuming is done automatically at regular intervals, so you may not have to worry in most cases. However, it could be turned off during installation or later by the Database Admin. If you host your PostgreSQL databases in AWS Aurora, you need to be careful even when AutoVacuum daemon is running. Check out this post for more information.
See VACUUM statement for more details.
Where PostgreSQL has an edge
PostgreSQL provides a rich set of operators and functions to work with different types of data.
PostgreSQL literally runs anywhere
PostgreSQL runs on Windows, Linux, Unix, etc. whereas SQL Server 2016 and older run only on Windows. SQL Server 2017 runs on Linux as well.
Inserting Test data into a PostgreSQL table is a breeze
This query inserts one million rows into the customers table with random data.
INSERT INTO customers (id, name) SELECT id, md5(random()::text) FROM generate_series(1, 1000000) AS id;
In SQL Server, you will be able to do this only using procedural code.
Multiple language support in PostgreSQL
There are currently four procedural languages available in the standard PostgreSQL distribution: PL/pgSQL, PL/Tcl, PL/Perl and PL/Python. PostgreSQL supports many other procedural languages as well, you just need to install the appropriate extension and enable it.
- PL/pgSQL: this PostgreSQL’s native procedural language is like SQL Server’s T-SQL, but more refined and feature-complete.
- PL/Python: PostgreSQL’s support for Python adds the enormous ecosystem of Python libraries to your arsenal. Imagine using Python functions in the middle of a SQL query, Yayy!!!
- PL/Perl: PostgreSQL has full support for Perl as a procedural language.
- PL/Tcl: PL/Tcl is a loadable procedural language for the PostgreSQL database system that enables the Tcl language to be used to write PostgreSQL functions.
- PL/R: PostgreSQL has full support for R, a statistical programming language used extensively in data science that has a robust set of high-quality plugins and add-ons.
C: Though PostgreSQL supports C, it has to be compiled separately. This comes in handy when speed and fine control of memory management, resource usage for tasks and performance are critical.
PostgreSQL also has extensions for other languages like Java, Ruby, PHP, Lua, Tcl, etc.
One can argue that SQL Server also supports multiple languages because it acts as a .NET host and can run code written in any .NET languages like C#, VB.NET, F#, etc. Yes, but that involves writing code in a .NET IDE like Visual Studio, compiling code, deploying the assembly in SQL Server, writing a wrapper T-SQL function for your C# function and then calling the function from your T-SQL code. Not to mention the debugging hell. You can’t write C# code “in-line” within your T-SQL code the way the LANGUAGE keyword in PostgreSQL allows you to write a function in multiple languages.
Having personally done multi-language programming in SQL Server, I can vouch for the fact that the level of multi-language support provided by PostgreSQL is at a different level compared to that of SQL Server.
PostgreSQL has a rich set of Functions and Operators
GREATEST function is similar to the
MAX function, except that it works across columns. The
LEAST is similar to
MIN function, except that it works across columns.
SELECT GREATEST(LastLoginDate, LastReportDate, LastInteractionDate) FROM Users;
You can’t do this in SQL Server, except with multiple SELECT statements and procedural constructs. Similarly, there are tons of functions available in PostgreSQL which are not available in SQL Server.
PostgreSQL has much better support for CSV
People working with data on a day-to-day basis would work with CSV often to convert data from one format to another and for analytical processing, etc. PostgreSQL’s COPY FROM and COPY TO command do a much cleaner job at handling CSVs: no more truncated texts, no more encoding nightmares, no more quoting/escaping issues.
You can drop an entire Schema in PostgreSQL with a single statement
In PostgreSQL, all you do is execute
DROP SCHEMA CASCADE. This is very useful during development but at the same time risky as hell in your Staging and Production environments.
In SQL Server,
DROP SCHEMA doesn’t support the
CASCADE clause. You must drop all the objects within the schema individually before dropping the schema, which can be a nightmare during development and prototyping.
Unicode and Character Encoding Basics
Unicode is a character encoding standard and UTF-8 and UTF-16 are different implementations of the standard.
UTF-8: UTF-8 uses one byte for characters codes below 128 and two, three or four bytes for characters beyond that. It is compatible with ASCII.
UTF-16: Always uses two or four bytes, not compatible with ASCII
UCS-2: Always uses two bytes.
PostgreSQL has native support for UTF-8 encoding and its CHAR, VARCHAR and TEXT types are UTF-8 by default. String operations and regular expressions are UTF-8 aware. PostgreSQL doesn’t support UTF-16.
SQL Server (prior to 2012) support only UCS-2, a subset of UTF-16. SQL Server 2012 introduced optional support for UTF-16, but you must select an UTF-16 collation for your database for it to work. SQL Server does not support UTF-8.
Where SQL Server has an edge
You cannot query multiple databases directly within a single query
Unlike SQL Server, you cannot reference two databases within a query directly. But there’s a workaround, you can use a dblink or a Foreign Data Wrapper.
You cannot execute procedural code directly in PostgreSQL
In SQL Server, you can declare variables or use conditional logic in scripts directly:
DECLARE @MeanAge = 100 IF EXISTS (SELECT * FROM Users WHERE Age > @MeanAge) SELECT 'Yay' ELSE SELECT 'Nay';
SQL Server will allow you to run this code with a Client tool or in Inline queries in application code. In PostgreSQL, you can run procedural code like this:
DO $$ -- declare BEGIN /* pl/pgsql here */ END $$;
However, you cannot return any results to the client. The only way you can run procedural code and return results to the client is by wrapping it within a Stored Procedure or Function and then executing it.
PostgreSQL doesn’t support Stored Procedures prior to version 11
PostgreSQL 10.x and below do not have Stored Procedures. But this is not a showstopper as you can do everything with a User Defined Function, with the only exception that you cannot have transactions inside a function.
PostgreSQL doesn’t support Computed Columns prior to version 12
PostgreSQL 11.x and below do not support Computed Columns. However, PostgreSQL 12, scheduled to be released in late 2019 introduces support for ‘Generated columns’. Check out this article for more information.