Database Administration
postgresql query-performance execution-plan postgresql-performance
Updated Fri, 20 May 2022 18:42:06 GMT

Do databases optimize repeated similar query params on the same column?


I have the following query:

SELECT * FROM table t
WHERE t.id IN :ids
AND t.id IN :allowedIds

(where the query params are later replaced)

Are these query params optimized by the database itself? I would like to avoid having to merge the two sets of ids together in code because it makes for poor readability. The first :ids constraint is what the user themselves passes on as a filter and the second :allowedIds constraint is what the user has access to. So I am hoping most databases, but Postgres in particular, would optimize this sort of thing.




Solution

I'll ignore the "most databases" in this question, otherwise I'd have to vote to close for lack of focus. Instead, I'll answer about PostgreSQL.

To "merge" the lists in your question, you'd have to build the intersection:

WHERE id IN (1, 2, 3, 4) AND id IN (3, 4, 5, 6)

is the same as

WHERE id IN (3, 4)

PostgreSQL doesn't do that automatically. PostgreSQL may choose to scan an existing index for one or both of the conditions, or it may go with a sequential scan, but the two conditions are not merged.

What is more, PostgreSQL will treat the two conditions as statistically independent and just multiply their selectivity, which may result in a bad estimate.

You'll very likely do better with a query that has a single IN list, which is the intersection of the original lists.

You asked about why the PostgreSQL team has not added this optimization into the code for PostgreSQL, so I have a couple of comments about that. This is a rather unusual requirement (I had misunderstood the question at first), and when adding optimizer code for such requirements there is always a trade-off: while your query would benefit, many queries would have to pay the price in the shape of extra CPU cycles spent to test if the optimization applies or not. Now query planning time is a rather sensitive area (it has to happen fast), so we are usually reluctant to add special processing for corner cases, particularly if they can be easily avoided by rewriting the query.

Also of note: You could do the intersection at the SQL level if you are able to pass the lists as arrays and install the intarray extension which allows you to calculate the intersection between two (integer!) arrays. e.g. where id = any(:ids & :allowedIds)







External Links

External links referenced by this document: