Data Types and Constraints
— Ancient Chinese proverb
Numeric Data Types
The integer
data type allows you to store integers.
You usually use this data type for counts, identifiers etc.
In our example, we use integer
for the duration
column of the task
table.
Note that there are actually multiple integer data types in PostgreSQL.
The smallint
data type allows you to store "small-range integers" and the bigint
data type allows you to store "large-range integers".
The difference between smallint
, integer
and bigint
is basically the number of bytes they contain which impacts the minimum and maximum integers that can be stored.
A smallint
value has 2 bytes, i.e. it can store values from -32768
up to 32767
.
An integer
value has 4 bytes, i.e. it can store values from -2147483648
up to 2147483647
.
A bigint
value has 8 bytes, i.e. it can store values from -9223372036854775808
up to 9223372036854775807
.
For most regular web applications, integer
values are more than enough, and by default, PostgreSQL uses integer
values for numbers.
Consider this example:
select 41234, pg_typeof(41234);
This will return:
| ?column? | pg_typeof |
| -------- | --------- |
| 41234 | integer |
The
pg_typeof
operator returns the data type of an expression. Additionally, you should ignore the column names in this section—only the column values are relevant for us here.
Note that by default integers have the type integer
even if they could be smallint
.
Consider this example:
select 42, pg_typeof(42);
This will return:
| ?column? | pg_typeof |
| -------- | --------- |
| 42 | integer |
You could use the ::
operator to convert a value into a different data type:
select 42::smallint, pg_typeof(42::smallint);
Note that
::
is a PostgreSQL-specific operator and not part of the official SQL standard.
This will return:
| int2 | pg_typeof |
| ---- | --------- |
| 42 | smallint |
Of course, if you provide a bigint
value, PostgreSQL will use the bigint
data type, since bigint
doesn't fit into integer
:
select 2147483652, pg_typeof(2147483652);
This will return:
| ?column? | pg_typeof |
| ---------- | --------- |
| 2147483652 | bigint |
If you would try to convert this value to an integer
, you would get an error.
For example, trying to execute select 2147483652::integer
would result in the error ERROR: 22003: integer out of range
.
The serial
data type allows you to represent autoincrementing integers (along with smallserial
and bigserial
).
This makes serial
very useful for unique identifiers since you don't need to handle the incrementing logic yourself.
In the task
table, the id
column has the serial
data type.
Note that there are many other data types and strategies for automatically handling unique identifiers. For simple applications it doesn't matter, so in this chapter we will stick to the
serial
data type.
PostgreSQL knows four data types for storing real numbers—decimal
, numeric
, real
and double precision
.
The decimal
and numeric
types are equivalent and allow you to store numbers with a very large number of digits.
These data types are recommended for quantities where your floating-point calculations need to be exact (e.g. when working with money).
By default, PostgreSQL will use numeric
for floating-point numbers:
select 0.5::double precision, pg_typeof(0.5);
This will return:
| float8 | pg_typeof |
| ------ | --------- |
| 0.5 | numeric |
Note that because of the arbitrary precision requirement, calculations on values of the numeric
data type are relatively slow.
The real
and double precision
data types allow you to store floating-point numbers with a precision of 6
and 15
digits, respectively.
Consider this example:
select 0.123456789123456789::real, pg_typeof(0.123456789123456789::real);
This will return:
| float4 | pg_typeof |
| -------- | --------- |
| 0.123457 | real |
Note that you only have a precision of 6
digits left after the conversion.
Something similar happens for the double precision
data type:
select 0.123456789123456789::double precision, pg_typeof(0.123456789123456789::double precision);
This will return:
| float8 | pg_typeof |
| ----------------- | ---------------- |
| 0.123456789123457 | double precision |
Note that you have a precision of 15
digits now (which is more than the real
data type, but still not enough to represent the number in the given example).
Character Data Types
PostgreSQL supports the char(n)
, varchar(n)
and text
data types for storing characters and strings.
The char(n)
and varchar(n)
data types can store strings up to n
characters in length.
For example, conversions to char(n)
and varchar(n)
would silently truncate the string:
select 'Next.js book'::char(3), pg_typeof('Next.js book'::char(3));
This will return:
| bpchar | pg_typeof |
| ------ | --------- |
| Nex | character |
You will get a similar result if you use varchar
:
select 'Next.js book'::varchar(3), pg_typeof('Next.js book'::varchar(3));
This will return:
| varchar | pg_typeof |
| ------- | ----------------- |
| Nex | character varying |
The text
data type allows you to store strings of arbitrary length.
For example:
select 'Next.js book'::text, pg_typeof('Next.js book'::text);
This will return:
| text | pg_typeof |
| ------------ | --------- |
| Next.js book | text |
Note that while theoretically char(n)
and varchar(n)
have minor performance advantages over text
, these performance advantages are usually irrelevant in practice (at least for PostgreSQL), so throughout this book we will simply always use the text
data type.
In our task
table we use text
both for the title
and the description
column.
If you're interested in more details about the tradeoffs between
char(n)
,varchar(n)
andtext
, check out the official PostgreSQL documentation.
Date/Time Data Types
Dates and times are a famously dreaded topic among programmers, especially in combination with persistent data storage. We will only look at the very tip of the iceberg here. However, you should keep in mind that there are many complexities that we will skip for now.
The three most important date/time data types are date
, time
and timestamp
.
The date
data type allows you to store the date part (year, month, day) without the time information.
Consider this example:
select '2023-07-04'::date, pg_typeof('2023-07-04'::date);
This will return:
| date | pg_typeof |
| ---------- | --------- |
| 2023-07-04 | date |
The time
data type is basically the "complement" to the date
type and allows you to store the time part (hours, minutes, seconds) without the date information.
Consider this example:
select '07:05:16'::time, pg_typeof('07:05:16'::time);
This will return:
| time | pg_typeof |
| -------- | --------- |
| 07:05:16 | time |
Finally, the timestamp
data type allows us to store a date and a time.
This is also the most commonly used date/time data type (since you usually care about the date and time).
Consider this example:
select '2023-07-04 07:05:16'::timestamp, pg_typeof('2023-07-04 07:05:16'::timestamp);
This will return:
| timestamp | pg_typeof |
| ------------------- | --------------------------- |
| 2023-07-04 07:05:16 | timestamp without time zone |
In our task
table, we use the created_at
column to store the time at which a task has been created.
Since we care about the date and the time, we use the timestamp
data type here.
There are additional data types like timestamp with time zone
(if we want to store the time zone with the timestamp), however for now we have enough data types to confidently work with dates and times in simple applications.
Enums
Enumerated data types are data types which comprise a static (and also ordered) set of values. This is a bit similar to creating a union of literal types in TypeScript (although enums and union types are not the same).
You can define an enum using the create type
statement.
For example, this is how we defined the status
enum:
create type status as enum ('todo', 'inprogress', 'done');
Once the enum is defined, it can be used like every other data type:
create table task (
-- code
status status,
-- code
);
Constraints
Often, we want to limit the kind of data that can be stored in a table beyond just limiting the data types. This can be accomplished with constraints.
The unique
constraint is used to ensure that the data in a column is unique among all the rows in a table.
For example, we gave the title
column the unique
constraint to ensure that there will never be two tasks with the same title:
create table task (
-- code
title text unique,
-- code
);
Another important constraint is the not null
constraint.
By default, you can insert a null
value as data, where null
is basically a special marker indicating that the given data doesn't exist (a bit similar to null
and undefined
in TypeScript).
The not null
constraint quite sensibly indicates that the inserted value can't be null
.
In our example, we want every task to have a title
, a description
and a status
so we mark these columns as not null
:
create table task (
-- code
title text not null,
description text not null,
-- code
);
However, we allow a task to not have a duration
and don't mark that column as not null
.
Basically, we apply the not null
constraint if we want to ensure that a value is always present in this column.
The primary key
constraint means that the column should be used to identify the rows of the table.
If a column is marked as primary key
, its values must be unique
and they can't be null
.
Since we want the id
to uniquely identify each task, we apply the primary key
constraint to the id
column:
create table task (
id serial primary key,
-- code
);
The check
constraint is used to specify a condition that each row must satisfy for the value to be accepted into a column.
We could have used this for the status
column by writing check in ('todo', 'inprogress', 'done')
.
However, since we knew the values in advance, we were able to use an enum instead.
Unlike enums though, check constraints allow more flexibility.
For example, we can give the duration
column a constraint that its values must be greater than 0:
create table task (
-- code
duration integer check (duration > 0),
-- code
);
The default
constraint allows us to specify a default value for a column.
If an insert operation doesn't provide a value for a column, PostgreSQL will automatically insert the specified value.
In our example, we want the created_at
value to simply be the current time and use the default
constraint together with current_timestamp
to accomplish that:
create table task (
-- code
created_at timestamp default current_timestamp
);