I have a table with five Boolean columns. In 90% plus of the rows, all the columns are null. (False
is equivalent to null
for me.)
Instead of having Boolean columns, I could have a single array column which contains an array of an enumerated custom datatype, and thereby store only the columns which are non-null.
I feel weird using arrays, but my coworker has pointed out to me that there isn't really a strong reason against using them, and we might actually see savings from using them since we aren't storing a bunch of empty columns.
Are there any downsides to using an array? Specifically: would they take up more space, take up more time to query, or prevent Postgres features (e.g. gin indexes) from being used?
TL;DR: Do not use an array. Use individual boolean
columns.
Your coworker may not be aware of actual storage requirements. Null storage is very cheap and efficient in Postgres. See:
A couple of boolean
columns are also very cheap and efficient - nullable or not. Actually, just 5 boolean
columns hardly matter for the row size at all - even if defined NOT NULL
. 1 byte each, 5 bytes and no alignment restrictions. Hardly relevant. There are typically much more substantial things you can do to your row. (You should have provided your actual table definition.)
Arrays have an overhead of 24 bytes. Plus actual data. And handling is much less convenient and much more expensive:
This related answer has a detailed assessment for three relevant options:
boolean
columnsinteger
to encode up to 32 boolean valuesbit(n)
or bit varying(n)
)External links referenced by this document:
Local articles referenced by this article: