I'm trying to optimize my geolocation query for a table of addresses of ~862k rows within a search radius using earth_box
. My first initial query is not too terrible:
explain analyze SELECT
id
FROM
location
WHERE
earth_box(ll_to_earth(40.65130101, -73.83367812), 25000) @> ll_to_earth(latitude, longitude)
;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on location (cost=46.97..3385.08 rows=863 width=16) (actual time=11.430..49.406 rows=29407 loops=1)
Recheck Cond: ('(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube @> (ll_to_earth(latitude, longitude))::cube)
Heap Blocks: exact=22479
-> Bitmap Index Scan on location_gist_lat_lon_idx (cost=0.00..46.76 rows=863 width=0) (actual time=7.942..7.943 rows=29407 loops=1)
Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube)
Planning Time: 0.711 ms
Execution Time: 51.018 ms
However, when I want to refine my search to get more accurate results with earth_distance
the execution time dramatically increases by 10x:
=> explain analyze SELECT
id
FROM
location
WHERE
earth_box(ll_to_earth(40.65130101, -73.83367812), 25000) @> ll_to_earth(latitude, longitude)
AND earth_distance(ll_to_earth(40.65130101, -73.83367812),
ll_to_earth(latitude, longitude)) < 25000 ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on location (cost=46.83..3820.75 rows=288 width=16) (actual time=11.264..537.018 rows=24898 loops=1)
Recheck Cond: ('(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube @> (ll_to_earth(latitude, longitude))::cube)
Filter: (sec_to_gc(cube_distance('(1347317.9013552233, -4647693.797459679, 4155081.692815027)'::cube, (ll_to_earth(latitude, longitude))::cube)) < '25000'::double precision)
Rows Removed by Filter: 4509
Heap Blocks: exact=22479
-> Bitmap Index Scan on location_gist_lat_lon_idx (cost=0.00..46.76 rows=863 width=0) (actual time=7.358..7.358 rows=29407 loops=1)
Index Cond: ((ll_to_earth(latitude, longitude))::cube <@ '(1322317.9173587903, -4672693.781456112, 4130081.708818594),(1372317.8853516562, -4622693.8134632455, 4180081.6768114604)'::cube)
Planning Time: 0.901 ms
Execution Time: 539.113 ms
(9 rows)
My table schema (excluded some columns):
Table "provider.location"
Column | Type | Collation | Nullable | Default
------------------+--------------------------+-----------+----------+-------------------
id | uuid | | not null |
created_at | timestamp with time zone | | not null | CURRENT_TIMESTAMP
updated_at | timestamp with time zone | | not null | CURRENT_TIMESTAMP
can_delete | boolean | | |
name | text | | |
address | text | | |
address_line_1 | text | | |
address_line_2 | text | | |
city | text | | |
state | text | | |
street | text | | |
zip | text | | |
confidence | int | | |
google_maps_link | text | | |
is_coe | boolean | | |
latitude | double precision | | |
longitude | double precision | | |
I also have a GiST index created for the lat/lon:
"location_gist_lat_lon_idx" gist (ll_to_earth(latitude, longitude))
I'm wondering what is it that I'm missing that's making the additional query execution time increase by 10x?
My postgres 13.5 instance has the following specs:
CPU: 4vCPU
Memory: 16GB
SDD: 250GB
This question is related to How can I speed-up my query on geo-location processes, however after following the suggested answer it didn't seem to improve my performance.
I don't know what you are missing, other than some perspective.
earth_distance is kind of slow. So computing it 29407 times is slow.
I don't know what from that linked post you tried to do. The suggestion was to create an index, which you already had. References are great, but you still need to tell us what you did.
earth_distance (the module) is mostly for demonstration purposes. Any serious work should be one with postgis. In my hands, postgis is about 5 times faster at the recheck/filter task than earth_distance is. If you were willing to do some C programming, you could probably make earth_distance quite a bit faster, but I would say that that is the overkill when postgis already exists.
Another thing you could do is get it to run in parallel. The only way to do this (that I could find) is to do alter table location set (parallel_workers = 4);
. I don't know why the planner doesn't pick parallel on its own without doing that. You might also need to increase max_parallel_workers_per_gather as well.
location
table. Other than that I can't seem to figure out a way to get the query to be faster. I thought the postgis extension would be overkill because i'm only really just querying for locations within a given search radius. — Jul 29, 2022 at 04:30 Local articles referenced by this article: