PostgreSQL enums should be used for values that directly impact application control flow. That is, they are for values that have specific meaning to the application code. They are not just data.
For example, consider an image conversion system where users can convert images from one format to another. A conversion could be stored in the database as something like the following:
create table conversion_jobs (
id int primary key generated by default as identity,
source_image_url text not null,
source_image_format_id unknown not null, -- column type to be determined
output_image_format_id unknown not null -- column type to be determined
);
The supported image formats are jpg
, png
, and gif
. No image formats can be added to the database without also updating the application. e.g. Adding tiff
support would require adding a new image decoder and encoder to the application.
How should these image formats be represented in the database?
The first question to answer is if a reference table is necessary. In this example it would be useful to have a description attached to each file type to display to the user. For example:
jpg
- best for photographspng
- best for illustrations with many colorsgif
- best for illustrations with few colorsSo a reference table is necessary.
create table image_formats (
id unknown primary key, -- column type to be determined
name text not null,
description text not null
);
But what column type should be used for the primary key? There are three reasonable choices: int
, text
, and an enum
.
int
would be the most traditional choice especially for certain ORMs. However, it can be inconvenient to use. To filter jobs based on image format requires either hard-coding an ID:
select *
from conversion_jobs
where output_image_format_id=3; -- magic number has no obvious meaning
Or joining to the image_formats
table:
select *
from conversion_jobs j
join image_formats f on j.output_image_format_id=f.id
where f.name='jpg';
The performance implication of the join is negligible, but it is a little inconvenient to have to write a join for such a simple filter.
A text
type would be much more convenient.
select *
from conversion_jobs
where output_image_format_id='jpg';
Depending on the size of the text
value there could be a performance loss relative to the int
, but in most cases it would be negligible.
However, both the int
and the text
key types have a deficiency: they cannot detect erroneous comparisons.
select *
from conversion_jobs
where output_image_format_id='foobar';
This will return 0 rows. It would be better if it was detected as an error.
An enum
type has the advantages of the approaches examined above in addition to the ability to detect invalid usage.
Here is what the complete schema would look like with an enum
.
create type image_format_id as enum ('jpg', 'png', 'gif');
create table image_formats (
id image_format_id primary key,
description text not null
);
insert into image_formats values
('jpg', 'best for photographs'),
('png', 'best for illustrations with many colors'),
('gif', 'best for illustrations with few colors');
create table conversion_jobs (
id int primary key generated by default as identity,
source_image_url text not null ,
source_image_format_id image_format_id not null references image_formats,
output_image_format_id image_format_id not null references image_formats
);
The queries are as simple as with the text
key:
select *
from conversion_jobs
where output_image_format_id='jpg';
But it will also detect invalid image formats:
select *
from conversion_jobs
where output_image_format_id='foobar';
ERROR: invalid input value for enum image_format_id: "foobar"
LINE 3: where output_image_format_id='foobar';
enum
Considerationsenum
values are stored as 4-byte integers. This means they frequently require less storage and are quicker to compare than text
. The performance aspect is usually small, but occasionally it matters.
Next, enum
values compare in the order they are defined rather than lexical order. For example:
create type priority as enum ('low', 'medium', 'high');
select
'low'::priority < 'high'::priority as enum_compare,
'low'::text < 'high'::text as lex_compare;
enum_compare │ lex_compare
──────────────┼─────────────
t │ f
This can be useful, but it is also confusing if it is not expected.
Lastly, adding a value to an enum
requires an alter type
statement rather than just an insert
.
enum
types are ideal for values that map to application logic. Reference tables are often presented as a preferable alternative to enum
types. I propose that they are not mutually exclusive approaches. The best solution will often use an enum
and a reference table.