Theoretical Computer Science
graph-algorithms big-list dc.parallel-comp near-neighbors
Updated Fri, 10 Jun 2022 11:15:38 GMT

Best algorithm for calculating lists of neighbours


Given a collection of thousands of points in 3D, I need to get the list of neighbours for each particle that fall inside some cutoff value (in terms of euclidean distance), and if possible, sorted from nearest fo farthest.

Which is the fastest parallel algorithm for this purpose?

PS: I also crossposted this in math exchange and SO




Solution

This is the well-studied problem of Nearest Neighbour Search. There is not one 'best solution' to the problem---you'll want to choose trade offs based on the input and requirements.

Do you need the creation of the data structure to be parallel? Or just the queries? Depending on how many thousands of points you're talking about, if your set of points is static it might not be impractical to do something naive like:

  • For each point $x$, construct a list of all other points, sorted by distance from $x$.

This can be done in $O(n^2\log n)$ time and $O(n^2)$ space, which is not that bad if you only need to run it once for a set of several thousand points. It will give you constant or logarithmic query time for pretty much anything you'd want to do.

If you need something smarter that requires less space or less time, I'd recommend reading the Wikipedia article on Nearest Neighbour Search and perhaps refining your question, telling us

  • how many times you'll need to build the structure from scratch
  • whether or not it needs to be dynamic
  • what exactly needs to be parallelized
  • whether your 'cutoff value' for Euclidean distance is fixed, or will vary from query to query
  • if possible, a tighter estimate on the number of points. When somebody says 'thousands' sometimes they mean 3,000 and sometimes they mean 300,000.