Database Administration
postgresql database-design performance datatypes array
Updated Fri, 20 May 2022 18:39:42 GMT

Does it make sense to store a couple of Boolean values as array?


I have a table with five Boolean columns. In 90% plus of the rows, all the columns are null. (False is equivalent to null for me.)

Instead of having Boolean columns, I could have a single array column which contains an array of an enumerated custom datatype, and thereby store only the columns which are non-null.

I feel weird using arrays, but my coworker has pointed out to me that there isn't really a strong reason against using them, and we might actually see savings from using them since we aren't storing a bunch of empty columns.

Are there any downsides to using an array? Specifically: would they take up more space, take up more time to query, or prevent Postgres features (e.g. gin indexes) from being used?




Solution

TL;DR: Do not use an array. Use individual boolean columns.


Your coworker may not be aware of actual storage requirements. Null storage is very cheap and efficient in Postgres. See:

A couple of boolean columns are also very cheap and efficient - nullable or not. Actually, just 5 boolean columns hardly matter for the row size at all - even if defined NOT NULL. 1 byte each, 5 bytes and no alignment restrictions. Hardly relevant. There are typically much more substantial things you can do to your row. (You should have provided your actual table definition.)

Arrays have an overhead of 24 bytes. Plus actual data. And handling is much less convenient and much more expensive:

This related answer has a detailed assessment for three relevant options:

  1. separate boolean columns
  2. an integer to encode up to 32 boolean values
  3. a bitstring (bit(n) or bit varying(n))





Comments (1)

  • +1 – Also the performance hit loading an array at runtime and fetching an element from it. Also the clarity hit by no longer having a dedicated, descriptive name for each boolean. — Aug 27, 2019 at 01:23